#levenshtein #edit-distance #automata #fuzzy-string

levenshtein_automata

Creates Levenshtein Automata in an efficient manner

3 unstable releases

0.2.1 May 18, 2021
0.2.0 Mar 30, 2020
0.1.1 May 19, 2018
0.1.0 May 8, 2018

#201 in Text processing

Download history 81662/week @ 2024-10-07 80969/week @ 2024-10-14 65787/week @ 2024-10-21 59517/week @ 2024-10-28 69762/week @ 2024-11-04 69953/week @ 2024-11-11 72191/week @ 2024-11-18 53276/week @ 2024-11-25 73878/week @ 2024-12-02 58961/week @ 2024-12-09 63838/week @ 2024-12-16 27261/week @ 2024-12-23 35892/week @ 2024-12-30 67502/week @ 2025-01-06 75087/week @ 2025-01-13 62888/week @ 2025-01-20

243,520 downloads per month
Used in 151 crates (9 directly)

MIT license

58KB
1.5K SLoC

Levenshtein-automaton

This crate makes it fast and simple to build a finite determinic automaton that computes the levenshtein distance from a given string.

Example

use levenshtein_automata::{LevenshteinAutomatonBuilder, Distance};

fn main() {

    // Building this factory is not free.
    let lev_automaton_builder = LevenshteinAutomatonBuilder::new(2, true);

    // We can now build an entire dfa.
    let dfa = lev_automaton_builder.build_dfa("Levenshtein");

    let mut state = dfa.initial_state();
    for &b in "Levenshtain".as_bytes() {
        state = dfa.transition(state, b);
    }

    assert_eq!(dfa.distance(state), Distance::Exact(1));
}

The implementation is based on the following paper Fast String Correction with Levenshtein-Automata (2002) by by Klaus Schulz and Stoyan Mihov. I also tried to explain it in the following blog post.

Bench

The time taken by the construction a Levenshtein DFA strongly depends on the max distance it can measure and the length of the input string.

Here is the time spent to build a Levenshtein DFA for the string "Levenshtein"

dfa dist1 no transposition        35,627 ns/iter (+/- 3,237)
dfa dist1 with transposition      36,493 ns/iter (+/- 12,680)
dfa dist2 no transposition        97,137 ns/iter (+/- 14,556)
dfa dist2 with transposition     100,958 ns/iter (+/- 4,231)
dfa dist3 no transposition       834,412 ns/iter (+/- 158,329)
dfa dist3 with transposition   1,414,523 ns/iter (+/- 396,278)
dfa dist4 no transposition     4,716,365 ns/iter (+/- 869,024)
dfa dist4 with transposition   8,044,162 ns/iter (+/- 594,523)

Dependencies

~0–395KB