#regex #pattern #pattern-matching #glob-pattern #glob #fnmatch

fnmatch-regex2

Convert a glob-style pattern to a regular expression

1 unstable release

0.3.0 Dec 21, 2023

#1346 in Encoding

Download history 453/week @ 2024-07-29 546/week @ 2024-08-05 574/week @ 2024-08-12 772/week @ 2024-08-19 671/week @ 2024-08-26 429/week @ 2024-09-02 528/week @ 2024-09-09 477/week @ 2024-09-16 595/week @ 2024-09-23 596/week @ 2024-09-30 606/week @ 2024-10-07 636/week @ 2024-10-14 469/week @ 2024-10-21 657/week @ 2024-10-28 735/week @ 2024-11-04 395/week @ 2024-11-11

2,270 downloads per month
Used in pks

BSD-2-Clause

39KB
673 lines

fnmatch-regex2 - build regular expressions to match glob-style patterns

This crate currently provides a single function, glob_to_regex, that converts a glob-style pattern with some shell extensions to a regular expression. Note that it only handles text pattern matching, there are no attempts to verify or construct any filesystem paths.

The glob-style pattern features currently supported are:

  • any character except ?, *, [, \, or { is matched literally

  • ? matches any single character except a slash (/)

  • * matches any sequence of zero or more characters that does not contain a slash (/)

  • a backslash allows the next character to be matched literally, except for the \a, \b, \e, \n, \r, and \v sequences

  • a [...] character class supports ranges, negation if the very first character is !, backslash-escaping, and also matching a ] character if it is the very first character possibly after the ! one (e.g. []] would only match a single ] character)

  • an {a,bbb,cc} alternation supports backslash-escaping, but not nested alternations or character classes yet

Note that the * and ? wildcard patterns, as well as the character classes, will never match a slash.

Examples:

  • abc.txt would only match abc.txt

  • foo/test?.txt would match e.g. foo/test1.txt or foo/test".txt, but not foo/test/.txt

  • /etc/c[--9].conf would match e.g. /etc/c-.conf, /etc/c..conf, or /etc/7.conf, but not /etc/c/.conf

  • linux-[0-9]*-{generic,aws} would match linux-5.2.27b1-generic and linux-4.0.12-aws, but not linux-unsigned-5.2.27b1-generic

Note that the negation modifier for character classes is !, not ^.

let re_name = fnmatch_regex2::glob_to_regex("linux-[0-9]*-{generic,aws}")?;
for name in &[
    "linux-5.2.27b1-generic",
    "linux-4.0.12-aws",
    "linux-unsigned-5.2.27b1-generic"
] {
    let okay = re_name.is_match(name);
    println!(
        "{}: {}",
        name,
        match okay { true => "yes", false => "no" },
    );
    assert!(okay == !name.contains("unsigned"));
}

Dependencies

~2.6–3.5MB
~63K SLoC