#fuzzy #ngrams #fuzzy-search #shingles

noodler

A port of the python-ngram project that provides fuzzy search using N-gram

1 unstable release

0.1.0 Apr 11, 2023

#1276 in Text processing

Download history 35/week @ 2024-07-22 19/week @ 2024-07-29 11/week @ 2024-08-05 24/week @ 2024-08-12 23/week @ 2024-08-19 23/week @ 2024-08-26 38/week @ 2024-09-09 24/week @ 2024-09-16 27/week @ 2024-09-23 26/week @ 2024-09-30 23/week @ 2024-10-07 32/week @ 2024-10-14 10/week @ 2024-10-21 10/week @ 2024-10-28 15/week @ 2024-11-04

70 downloads per month

MIT/Apache

25KB
596 lines

🍜 Noodler

In computer science, "noodler" is used to describe programs that handle text. Because algorithms like n-grams are typically used to extract information from text, similar to pulling strands of noodles out of a pile of dough, "noodler" can be associated with algorithms that extract information from text because they can be seen as "processing" programs for text, just as noodle makers "produce" noodles from dough.

ChatGPT

A port of the python-ngram project that provides fuzzy search using N-gram.

✍️ Example

use noodler::NGram;

let ngram = NGram::<&str>::builder()
    .arity(2)
    .warp(3.0)
    .threshold(0.75)
    .build()
    // Feed with known words
    .fill(vec!["pie", "animal", "tomato", "seven", "carbon"]);

// Try an unknown/misspelled word, and find a similar match
let word = "tomacco";
let top = ngram.search_sorted(word).next();
if let Some((text, similarity)) = top {
    if similarity > 0.99 {
        println!("{}", text);
    } else {
        println!(
            "{} (did you mean {}? [{:.0}% match])",
            word,
            text,
            similarity * 100.0
        );
    }
} else {
    println!("🗙 {}", word);
}

💭 Inspired by

Please check out these awesome works that helped a lot in the creation of noodler:

  • python-ngram: Set that supports searching by ngram similarity.
  • ngrammatic: A rust crate providing fuzzy search/string matching using N-grams.

🚩 Minimal supported Rust version

All tests passed with rustc v1.41, earlier versions may not compile.

⚖️ License

Licensed under either of

at your option.

No runtime deps