9 releases

0.3.0	Feb 12, 2023
0.2.3	Mar 8, 2022
0.2.2	Jan 21, 2022
0.2.1	Dec 27, 2021
0.1.3	Oct 31, 2021

#1146 in Text processing

28 downloads per month
Used in 3 crates (2 directly)

MIT/Apache

46KB
493 lines

Regex for Humans

The goal of this crate is simple: give everybody the power of regular expressions without having to learn the complicated syntax. It is inspired by ReadableRegex.jl. This crate is a wrapper around the core Rust regex library.

Example usage

If you want to match a date of the format 2021-10-30, you could use the following code to generate a regex:

use human_regex::{beginning, digit, exactly, text, end};
let regex_string = beginning()
    + exactly(4, digit())
    + text("-")
    + exactly(2, digit())
    + text("-")
    + exactly(2, digit())
    + end();
assert!(regex_string.to_regex().is_match("2014-01-01"));

The to_regex() method returns a standard Rust regex. We can do this another way with slightly less repetition though!

use human_regex::{beginning, digit, exactly, text, end};
let first_regex_string = text("-") + exactly(2, digit());
let second_regex_string = beginning()
    + exactly(4, digit())
    + exactly(2, first_regex_string)
    + end();
assert!(second_regex_string.to_regex().is_match("2014-01-01"));

For a more extensive set of examples, please see The Cookbook.

Features

This crate currently supports the vast majority of syntax available in the core Rust regex library through a human-readable API.

Single Character

Implemented?	Expression	Description
`any()`	`.`	any character except new line (includes new line with s flag)
`digit()`	`\d`	digit (`\p{Nd}`)
`non_digit()`	`\D`	not digit
`unicode_category(UnicodeCategory)`	`\p{L}`	Unicode non-script category
`unicode_script(UnicodeScript)`	`\p{Greek}`	Unicode script category
`non_unicode_category(UnicodeCategory)`	`\P{L}`	Negated one-letter name Unicode character class
`non_unicode_script(UnicodeCategory)`	`\P{Greek}`	negated Unicode character class (general category or script)

Character Classes

Implemented?	Expression	Description
`or(&['x', 'y', 'z'])`	`[xyz]`	A character class matching either x, y or z (union).
`nor(&['x', 'y', 'z'])`	`[^xyz]`	A character class matching any character except x, y and z.
`within('a'..='z')`	`[a-z]`	A character class matching any character in range a-z.
`without('a'..='z')`	`[^a-z]`	A character class matching any character outside range a-z.
See below	`[[:alpha:]]`	ASCII character class (`[A-Za-z]`)
`non_alphanumeric()`	`[[:^alpha:]]`	Negated ASCII character class (`[^A-Za-z]`)
`or()`	`[x[^xyz]]`	Nested/grouping character class (matching any character except y and z)
`and(&[])`/`&`	`[a-y&&xyz]`	Intersection (a-y AND xyz = xy)
`(or[1,2,3,4] & nor(3))`	`[0-9&&[^4]]`	Subtraction using intersection and negation (matching 0-9 except 4)
`subtract(&[],&[])`	`[0-9--4]`	Direct subtraction (matching 0-9 except 4). Use .collect::<Vec> to use ranges.
`xor(&[],&[])`	`[a-g~~b-h]`	Symmetric difference (matching `a` and `h` only). Requires .collect() for ranges.
`or(&escape_all(&['[',']']))`	`[\[\]]`	Escaping in character classes (matching `[` or `]`)

Perl Character Classes

Implemented?	Expression	Description
`digit()`	`\d`	digit (`\p{Nd}`)
`non_digit()`	`\D`	not digit
`whitespace()`	`\s`	whitespace (`\p{White_Space}`)
`non_whitespace()`	`\S`	not whitespace
`word()`	`\w`	word character (`\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}`)
`non_word()`	`\W`	not word character

ASCII Character Classes

Implemented?	Expression	Description
`alphanumeric()`	`[[:alnum:]]`	alphanumeric (`[0-9A-Za-z]`)
`alphabetic()`	`[[:alpha:]]`	alphabetic (`[A-Za-z]`)
`ascii()`	`[[:ascii:]]`	ASCII (`[\x00-\x7F]`)
`blank()`	`[[:blank:]]`	blank (`[\t ]`)
`control()`	`[[:cntrl:]]`	control (`[\x00-\x1F\x7F]`)
`digit()`	`[[:digit:]]`	digits (`[0-9]`)
`graphical()`	`[[:graph:]]`	graphical (`[!-~]`)
`uppercase()`	`[[:lower:]]`	lower case (`[a-z]`)
`printable()`	`[[:print:]]`	printable (`[ -~]`)
`punctuation()`	`[[:punct:]]`	punctuation ([!-/:-@\[-`{-~])
`whitespace()`	`[[:space:]]`	whitespace (`[\t\n\v\f\r ]`)
`lowercase()`	`[[:upper:]]`	upper case (`[A-Z]`)
`word()`	`[[:word:]]`	word characters (`[0-9A-Za-z_]`)
`hexdigit()`	`[[:xdigit:]]`	hex digit (`[0-9A-Fa-f]`)

Repetitions

Implemented?	Expression	Description
`zero_or_more(x)`	`x*`	zero or more of x (greedy)
`one_or_more(x)`	`x+`	one or more of x (greedy)
`zero_or_one(x)`	`x?`	zero or one of x (greedy)
`zero_or_more(x)`	`x*?`	zero or more of x (ungreedy/lazy)
`one_or_more(x).lazy()`	`x+?`	one or more of x (ungreedy/lazy)
`zero_or_more(x).lazy()`	`x??`	zero or one of x (ungreedy/lazy)
`between(n, m, x)`	`x{n,m}`	at least n x and at most m x (greedy)
`at_least(n, x)`	`x{n,}`	at least n x (greedy)
`exactly(n, x)`	`x{n}`	exactly n x
`between(n, m, x).lazy()`	`x{n,m}?`	at least n x and at most m x (ungreedy/lazy)
`at_least(n, x).lazy()`	`x{n,}?`	at least n x (ungreedy/lazy)

Composites

Implemented?	Expression	Description
`+`	`xy`	concatenation (x followed by y)
`or()`	`x\|y`	alternation (x or y, prefer x)

Empty matches

Implemented?	Expression	Description
`beginning()`	`^`	the beginning of text (or start-of-line with multi-line mode)
`end()`	`$`	the end of text (or end-of-line with multi-line mode)
`beginning_of_text()`	`\A`	only the beginning of text (even with multi-line mode enabled)
`end_of_text()`	`\z`	only the end of text (even with multi-line mode enabled)
`word_boundary()`	`\b`	a Unicode word boundary (\w on one side and \W, \A, or \z on other)
`non_word_boundary()`	`\B`	not a Unicode word boundary

Groupings

Implemented?	Expression	Description
`capture(exp)`	`(exp)`	numbered capture group (indexed by opening parenthesis)
`named_capture(exp, name)`	`(?P<name>exp)`	named (also numbered) capture group
Handled implicitly through functional composition	`(?:exp)`	non-capturing group
See below	`(?flags)`	set flags within current group
See below	`(?flags:exp)`	set flags for exp (non-capturing)

Flags

Implemented?	Expression	Description
`case_insensitive(exp)`	`i`	case-insensitive: letters match both upper and lower case
`multi_line_mode(exp)`	`m`	multi-line mode: `^` and `$` match begin/end of line
`dot_matches_newline_too(exp)`	`s`	allow `.` to match `\n`
will not be implemented¹	`U`	swap the meaning of `x` and `x?`
`disable_unicode(exp)`	`u`	Unicode support (enabled by default)
will not be implemented²	`x`	ignore whitespace and allow line comments (starting with `#`)

With the declarative nature of this library, use of this flag would just obfuscate meaning.
When using human_regex, comments should be added in source code rather than in the regex string.

Dependencies

~2.1–3MB
~54K SLoC