#levenshtein

fuzzy_prefix_search

Fuzzy search for prefix matches

14 releases

0.3.6 Sep 12, 2024
0.3.5 Sep 11, 2024
0.3.2 Aug 31, 2024
0.2.0 Aug 7, 2024
0.1.5 Aug 7, 2024

#9 in #levenshtein

Download history 554/week @ 2024-08-04 29/week @ 2024-08-11 139/week @ 2024-08-18 139/week @ 2024-08-25 360/week @ 2024-09-01 227/week @ 2024-09-08 62/week @ 2024-09-15 29/week @ 2024-09-22 17/week @ 2024-09-29 2/week @ 2024-10-06

120 downloads per month

MIT/Apache

24KB
341 lines

github crates.io docs.rs tests

Fuzzy Prefix Search

Flexible Trie implementation in Rust for fuzzy prefix string searching and auto-completion.

Documentation:

Features

  • Prefix-based fuzzy searching (Levenshtein distance)
  • Fuzzy search with customizable edit distance
  • Multiple data associations per word
  • Jaro-Winkler similarity scoring for search results
  • Thread safe
  • No dependencies

Installation

Add this to your Cargo.toml:

[dependencies]
fuzzy_prefix_search = "0.2"

Usage

Here's a quick example of how to use the Fuzzy Prefix Search:

use fuzzy_prefix_search::Trie;

fn main() {
    let mut trie = Trie::new();

    // Insert words with associated data
    trie.insert("apple", 1);
    trie.insert("application", 2);
    trie.insert("appetite", 3);

    // Perform a fuzzy search
    let results = trie.search_within_distance("appl", 1);

    for result in results {
        println!("Word: {}, Data: {:?}", result.word, result.data);
    }

    // Search with similarity scoring
    let scored_results = trie.search_within_distance_scored("aple", 2);

    for result in scored_results {
        println!("Word: {}, Data: {:?}, Score: {}", result.word, result.data, result.score);
    }
}

Advanced Usage

Custom Data Types

The Trie supports any data type that implements Clone, PartialEq, Eq, and Hash:

#[derive(Clone, PartialEq, Eq, Hash)]
struct CustomData {
    id: u32,
    value: String,
}

let mut trie = Trie::new();
trie.insert("example", CustomData { id: 1, value: "Test".to_string() });

Removing Data

You can remove all occurrences of a specific data value:

trie.remove_all(&2);

Performance

  • O(k) time complexity for insertion, where k is the length of the word
  • Space-efficient storage using a tree structure with shared prefixes
  • TODO: Benchmarks, optimizations, algorithm selection...

Caveat Emptor: we use unsafe in deletes for 2x read performance compared to Rc/RefCell approach.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Resources

License

This project is licensed under the MIT or Apache 2.0 License - see the LICENSE files for details.

No runtime deps