## str-distance

Distance metrics to evaluate distances between strings

### 1 unstable release

 0.1.0 Feb 25, 2020

#1716 in Algorithms

MIT/Apache

76KB
1.5K SLoC

# str-distance

A crate to evaluate distances between strings (and others).

Heavily inspired by the julia StringDistances

## Distance Metrics

• Jaro Distance

• Levenshtein Distance

• Damerau-Levenshtein Distance

• RatcliffObershelp Distance

• Q-gram distances compare the set of all slices of length `q` in each str, where `q > 0`

• The crate includes distance "modifiers", that can be applied to any distance.

• Winkler diminishes the distance of strings with common prefixes. The Winkler adjustment was originally defined for the Jaro similarity score but this package defines it for any string distance.
• TokenSort adjusts for differences in word orders by reording words alphabetically.
• TokenSet adjusts for differences in word orders and word numbers by comparing the intersection of two strings with each string.

## Usage

### The `str_distance::str_distance*` convenience functions.

`str_distance` and `str_distance_normalized` take the two string inputs for which the distance is determined using the passed 'DistanceMetric`. `str_distance_normalized` evaluates the normalized distance between two strings. A value of '0.0' corresponds to the "zero distance", both strings are considered equal by means of the metric, whereas a value of '1.0' corresponds to the maximum distance that can exist between the strings.

Calling the `str_distance::str_distance*` is just convenience for `DistanceMetric.str_distance*("", "")`

#### Example

Levenshtein metrics offer the possibility to define a maximum distance at which the further calculation of the exact distance is aborted early.

Distance

``````use str_distance::*;

// calculate the exact distance
assert_eq!(str_distance("kitten", "sitting", Levenshtein::default()), DistanceValue::Exact(3));

// short circuit if distance exceeds 10
let s1 = "Wisdom is easily acquired when hiding under the bed with a saucepan on your head.";
let s2 = "The quick brown fox jumped over the angry dog.";
assert_eq!(str_distance(s1, s2, Levenshtein::with_max_distance(10)), DistanceValue::Exceeded(10));
``````

Normalized Distance

``````use str_distance::*;
assert_eq!(str_distance_normalized("" , "", Levenshtein::default()), 0.0);
assert_eq!(str_distance_normalized("nacht", "nacht", Levenshtein::default()), 0.0);
assert_eq!(str_distance_normalized("abc", "def", Levenshtein::default()), 1.0);
``````

### The `DistanceMetric` trait

``````use str_distance::{DistanceMetric, SorensenDice};
// QGram metrics require the length of the underlying fragment length to use for comparison.
// For `SorensenDice` default is 2.
assert_eq!(SorensenDice::new(2).str_distance("nacht", "night"), 0.75);

``````

`DistanceMetric` was designed for `str` types, but is not limited to. Calculating distance is possible for all data types which are comparable and are passed as 'IntoIterator', e.g. as `Vec`

``````use str_distance::{DistanceMetric, Levenshtein, DistanceValue};

assert_eq!(*Levenshtein::default().distance(&[1,2,3], &[1,2,3,4,5,6]),3);
``````

## Documentation

Full docs available at docs.rs