Regex for Humans
The goal of this crate is simple: give everybody the power of regular expressions without having
to learn the complicated syntax. It is inspired by ReadableRegex.jl.
This crate is a wrapper around the core Rust regex library.
Example usage
If you want to match a date of the format 2021-10-30
, you could use the following code to generate a regex:
use human_regex::{beginning, digit, exactly, text, end};
let regex_string = beginning()
+ exactly(4, digit())
+ text("-")
+ exactly(2, digit())
+ text("-")
+ exactly(2, digit())
+ end();
assert!(regex_string.to_regex().is_match("2014-01-01"));
The to_regex()
method returns a standard Rust regex. We can do this another way with slightly less repetition though!
use human_regex::{beginning, digit, exactly, text, end};
let first_regex_string = text("-") + exactly(2, digit());
let second_regex_string = beginning()
+ exactly(4, digit())
+ exactly(2, first_regex_string)
+ end();
assert!(second_regex_string.to_regex().is_match("2014-01-01"));
For a more extensive set of examples, please see The Cookbook.
Features
This crate currently supports the vast majority of syntax available in the core Rust regex library through a human-readable API.
Single Character
Implemented? |
Expression |
Description |
any() |
. |
any character except new line (includes new line with s flag) |
digit() |
\d |
digit (\p{Nd} ) |
non_digit() |
\D |
not digit |
unicode_category(UnicodeCategory) |
\p{L} |
Unicode non-script category |
unicode_script(UnicodeScript) |
\p{Greek} |
Unicode script category |
non_unicode_category(UnicodeCategory) |
\P{L} |
Negated one-letter name Unicode character class |
non_unicode_script(UnicodeCategory) |
\P{Greek} |
negated Unicode character class (general category or script) |
Character Classes
Implemented? |
Expression |
Description |
or(&['x', 'y', 'z']) |
[xyz] |
A character class matching either x, y or z (union). |
nor(&['x', 'y', 'z']) |
[^xyz] |
A character class matching any character except x, y and z. |
within('a'..='z') |
[a-z] |
A character class matching any character in range a-z. |
without('a'..='z') |
[^a-z] |
A character class matching any character outside range a-z. |
See below |
[[:alpha:]] |
ASCII character class ([A-Za-z] ) |
non_alphanumeric() |
[[:^alpha:]] |
Negated ASCII character class ([^A-Za-z] ) |
or() |
[x[^xyz]] |
Nested/grouping character class (matching any character except y and z) |
and(&[]) /& |
[a-y&&xyz] |
Intersection (a-y AND xyz = xy) |
(or[1,2,3,4] & nor(3)) |
[0-9&&[^4]] |
Subtraction using intersection and negation (matching 0-9 except 4) |
subtract(&[],&[]) |
[0-9--4] |
Direct subtraction (matching 0-9 except 4). Use .collect::<Vec> to use ranges. |
xor(&[],&[]) |
[a-g~~b-h] |
Symmetric difference (matching a and h only). Requires .collect() for ranges. |
or(&escape_all(&['[',']'])) |
[\[\]] |
Escaping in character classes (matching [ or ] ) |
Perl Character Classes
Implemented? |
Expression |
Description |
digit() |
\d |
digit (\p{Nd} ) |
non_digit() |
\D |
not digit |
whitespace() |
\s |
whitespace (\p{White_Space} ) |
non_whitespace() |
\S |
not whitespace |
word() |
\w |
word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control} ) |
non_word() |
\W |
not word character |
ASCII Character Classes
Implemented? |
Expression |
Description |
alphanumeric() |
[[:alnum:]] |
alphanumeric ([0-9A-Za-z] ) |
alphabetic() |
[[:alpha:]] |
alphabetic ([A-Za-z] ) |
ascii() |
[[:ascii:]] |
ASCII ([\x00-\x7F] ) |
blank() |
[[:blank:]] |
blank ([\t ] ) |
control() |
[[:cntrl:]] |
control ([\x00-\x1F\x7F] ) |
digit() |
[[:digit:]] |
digits ([0-9] ) |
graphical() |
[[:graph:]] |
graphical ([!-~] ) |
uppercase() |
[[:lower:]] |
lower case ([a-z] ) |
printable() |
[[:print:]] |
printable ([ -~] ) |
punctuation() |
[[:punct:]] |
punctuation ([!-/:-@\[-`{-~] ) |
whitespace() |
[[:space:]] |
whitespace ([\t\n\v\f\r ] ) |
lowercase() |
[[:upper:]] |
upper case ([A-Z] ) |
word() |
[[:word:]] |
word characters ([0-9A-Za-z_] ) |
hexdigit() |
[[:xdigit:]] |
hex digit ([0-9A-Fa-f] ) |
Repetitions
Implemented? |
Expression |
Description |
zero_or_more(x) |
x* |
zero or more of x (greedy) |
one_or_more(x) |
x+ |
one or more of x (greedy) |
zero_or_one(x) |
x? |
zero or one of x (greedy) |
zero_or_more(x) |
x*? |
zero or more of x (ungreedy/lazy) |
one_or_more(x).lazy() |
x+? |
one or more of x (ungreedy/lazy) |
zero_or_more(x).lazy() |
x?? |
zero or one of x (ungreedy/lazy) |
between(n, m, x) |
x{n,m} |
at least n x and at most m x (greedy) |
at_least(n, x) |
x{n,} |
at least n x (greedy) |
exactly(n, x) |
x{n} |
exactly n x |
between(n, m, x).lazy() |
x{n,m}? |
at least n x and at most m x (ungreedy/lazy) |
at_least(n, x).lazy() |
x{n,}? |
at least n x (ungreedy/lazy) |
Composites
Implemented? |
Expression |
Description |
+ |
xy |
concatenation (x followed by y) |
or() |
x|y |
alternation (x or y, prefer x) |
Empty matches
Implemented? |
Expression |
Description |
beginning() |
^ |
the beginning of text (or start-of-line with multi-line mode) |
end() |
$ |
the end of text (or end-of-line with multi-line mode) |
beginning_of_text() |
\A |
only the beginning of text (even with multi-line mode enabled) |
end_of_text() |
\z |
only the end of text (even with multi-line mode enabled) |
word_boundary() |
\b |
a Unicode word boundary (\w on one side and \W, \A, or \z on other) |
non_word_boundary() |
\B |
not a Unicode word boundary |
Groupings
Implemented? |
Expression |
Description |
capture(exp) |
(exp) |
numbered capture group (indexed by opening parenthesis) |
named_capture(exp, name) |
(?P<name>exp) |
named (also numbered) capture group |
Handled implicitly through functional composition |
(?:exp) |
non-capturing group |
See below |
(?flags) |
set flags within current group |
See below |
(?flags:exp) |
set flags for exp (non-capturing) |
Flags
Implemented? |
Expression |
Description |
case_insensitive(exp) |
i |
case-insensitive: letters match both upper and lower case |
multi_line_mode(exp) |
m |
multi-line mode: ^ and $ match begin/end of line |
dot_matches_newline_too(exp) |
s |
allow . to match \n |
will not be implemented1 |
U |
swap the meaning of x* and x*? |
disable_unicode(exp) |
u |
Unicode support (enabled by default) |
will not be implemented2 |
x |
ignore whitespace and allow line comments (starting with # ) |
- With the declarative nature of this library, use of this flag would just obfuscate meaning.
- When using
human_regex
, comments should be added in source code rather than in the regex string.