6 releases

Uses new Rust 2024

new 0.2.2 May 22, 2025
0.2.1 May 22, 2025
0.1.3 May 22, 2025

#346 in Text processing

Download history 85/week @ 2025-05-16

85 downloads per month

MIT license

200KB
1K SLoC

fuzzy-aho-corasick

A high-performance, Unicode-aware Rust implementation of the Aho–Corasick automaton with fuzzy matching (insertions, deletions, substitutions, transpositions) and optional SIMD acceleration.

This library is rooted in the scientific work “Fuzzified Aho–Corasick Search Automata” by Z. Horák, V. Snášel, A. Abraham, and A. E. Hassanien (see the PDF).

Features

  • Exact & Fuzzy Matching: Locate exact patterns or allow configurable edit operations (Levenshtein distance + transposition).
  • Unicode‐Aware: Operates on Unicode grapheme clusters and supports case‐insensitive matching.
  • SIMD‐Accelerated: Uses std::simd on supported targets (ARM NEON, x86_64 AVX2/AVX512) for batch output scoring.
  • Fine‐Grained Limits: Per‐pattern caps on insertions, deletions, substitutions, swaps, or total edits.
  • Non‐Overlapping Option: Greedily choose longest non‐overlapping matches from left to right.
  • Fuzzy Replacer: Perform fuzzy find‐and‐replace, preserving unmatched segments.

Installation

Add to your Cargo.toml:

[dependencies]
fuzzy-aho-corasick = "0.1"

Enable SIMD acceleration (optional, off by default):

[dependencies]
fuzzy-aho-corasick = { version = "0.1", features = ["simd"] }

Quick Start

use fuzzy_aho_corasick::{FuzzyAhoCorasickBuilder, FuzzyLimits};

fn main() {
    // Build an engine allowing up to 1 edit per match
    let engine = FuzzyAhoCorasickBuilder::new()
        .fuzzy(FuzzyLimits::new().edits(1))
        .case_insensitive(true)
        .non_overlapping(true)
        .build(["hello", "world"]);

    let text = "H3llo W0rld!";
    let matches = engine.search(text, 0.8);

    for m in matches {
        println!(
            "matched pattern '{}' as '{}' (score {:.2})",
            m.pattern, m.text, m.similarity
        );
    }
    // Output: matched pattern 'hello' as 'H3llo' (score 0.90)
}

Builder API

let builder = FuzzyAhoCorasickBuilder::new()
// Maximum total edits (ins+del+sub+swap) per match
.fuzzy(FuzzyLimits::new().edits(2))
// Custom penalties
.penalties(FuzzyPenalties::default())
.substitution(0.7)
.insertion(0.9)
.deletion(0.9)
.swap(1))
// Unicode case folding
.case_insensitive(true)
// No overlapping matches
.non_overlapping(true);

let engine = builder.build(["pattern1", "pattern2"]);

Fuzzy Replacer

let replacer = FuzzyAhoCorasickBuilder::new().build_replacer([
("foo", "bar"),
("baz", "qux"),
]);

let out = replacer.replace("F00 and BAZ!", 0.8);
assert_eq!(out, "bar and qux!");

SIMD Acceleration

When compiled with the simd feature on supported architectures, output scoring batches are vectorized via std::simd, offering significant speedups for large pattern sets.

License

This project is distributed under the MIT License. See LICENSE for details.

Dependencies