#regex #digits #string #character

bin+lib rusty_regex

A minimalistic regex engine in Rust using the pipeline: Regex -> AST -> NFA -> DFA -> Match(String)

2 unstable releases

new 0.2.0 Apr 4, 2025
0.1.0 Apr 4, 2025

#667 in Text processing

26 downloads per month

MIT license

46KB
994 lines

Rusty Regex

Rusty Regex is a minimalistic regex engine written in Rust. It compiles a regex pattern into an Abstract Syntax Tree (AST), then builds a Non-deterministic Finite Automaton (NFA), converts that NFA into a Deterministic Finite Automaton (DFA) via subset construction, and finally uses the DFA to match input strings.

Supported Features

  • Literals & Wildcards
    • Literal characters (e.g., "abc")
    • Wildcard . matching any single character
  • Character Classes
    • Explicit classes: e.g., [abc]
    • Ranges: e.g., [a-c]
  • Predefined Classes
    • \d for digits
    • \w for word characters (letters, digits, underscore)
  • Quantifiers
    • Zero or more (*)
    • One or more (+)
    • Zero or one (?)
    • Exact count ({n})
    • Range ({m,n} or {m,})
  • Grouping & Capturing
    • Parentheses for grouping (all groups are capturing by default)
  • Alternation
    • The pipe operator (|) for alternatives
  • Anchors
    • Start (^) and end ($) anchors to match input positions

Design Overview

Rusty Regex works in several phases:

  1. Parsing: Converts the regex string into an AST.
  2. NFA Construction: Transforms the AST into an NFA.
  3. DFA Conversion: Converts the NFA into a DFA using subset construction.
  4. Matching: Simulates the DFA over input strings (scanning for matches).

Example Usage

Below is a simple example demonstrating how to compile a regex and test input strings:

use rusty_regex::Regex;

fn main() {
    // Compile a regex pattern.
    let regex = Regex::new("^(a|b)*c$").unwrap();

    // Test various inputs.
    let tests = vec![
        ("aaabc", true),
        ("bbc", true),
        ("c", true),
        ("aaab", false),
    ];

    for (input, expected) in tests {
        let result = regex.is_match(input);
        println!("Input: {:?}, Match: {}, Expected: {}", input, result, expected);
    }
}

No runtime deps