#levenshtein #distance #string #similarity #jaro

str-distance

Distance metrics to evaluate distances between strings

1 unstable release

0.1.0 Feb 25, 2020

#4 in #jaro

Download history 13/week @ 2023-11-26 3/week @ 2023-12-10 13/week @ 2023-12-31 12/week @ 2024-01-07 15/week @ 2024-01-14 19/week @ 2024-02-18 32/week @ 2024-02-25 16/week @ 2024-03-03

67 downloads per month

MIT/Apache

76KB
1.5K SLoC

str-distance

Build Status Crates.io Documentation

A crate to evaluate distances between strings (and others).

Heavily inspired by the julia StringDistances

Distance Metrics

Usage

The str_distance::str_distance* convenience functions.

str_distance and str_distance_normalized take the two string inputs for which the distance is determined using the passed 'DistanceMetric. str_distance_normalized` evaluates the normalized distance between two strings. A value of '0.0' corresponds to the "zero distance", both strings are considered equal by means of the metric, whereas a value of '1.0' corresponds to the maximum distance that can exist between the strings.

Calling the str_distance::str_distance* is just convenience for DistanceMetric.str_distance*("", "")

Example

Levenshtein metrics offer the possibility to define a maximum distance at which the further calculation of the exact distance is aborted early.

Distance

use str_distance::*;

// calculate the exact distance 
assert_eq!(str_distance("kitten", "sitting", Levenshtein::default()), DistanceValue::Exact(3));

// short circuit if distance exceeds 10
let s1 = "Wisdom is easily acquired when hiding under the bed with a saucepan on your head.";
let s2 = "The quick brown fox jumped over the angry dog.";
assert_eq!(str_distance(s1, s2, Levenshtein::with_max_distance(10)), DistanceValue::Exceeded(10));

Normalized Distance

use str_distance::*;
assert_eq!(str_distance_normalized("" , "", Levenshtein::default()), 0.0);
assert_eq!(str_distance_normalized("nacht", "nacht", Levenshtein::default()), 0.0);
assert_eq!(str_distance_normalized("abc", "def", Levenshtein::default()), 1.0);

The DistanceMetric trait

use str_distance::{DistanceMetric, SorensenDice};
// QGram metrics require the length of the underlying fragment length to use for comparison.
// For `SorensenDice` default is 2.
assert_eq!(SorensenDice::new(2).str_distance("nacht", "night"), 0.75);

DistanceMetric was designed for str types, but is not limited to. Calculating distance is possible for all data types which are comparable and are passed as 'IntoIterator', e.g. as Vec

use str_distance::{DistanceMetric, Levenshtein, DistanceValue};

assert_eq!(*Levenshtein::default().distance(&[1,2,3], &[1,2,3,4,5,6]),3);

Documentation

Full docs available at docs.rs

References

License

Licensed under either of these:

No runtime deps