#string #fuzzy #matching #trigram

trigram

Trigram-based string similarity for fuzzy matching

11 unstable releases (3 breaking)

✓ Uses Rust 2018 edition

0.4.4 Oct 6, 2019
0.4.3 Oct 6, 2019
0.3.0 Oct 6, 2019
0.2.4 Oct 6, 2019
0.1.0 Oct 5, 2019

#110 in Text processing


Used in 1 crate

Apache-2.0

13KB
216 lines

trigram

Build Status License Documentation

This Rust crate contains functions for fuzzy string matching.

It exports two functions. The similarity function returns the similarity of two strings, and the find_words_iter function returns an iterator of matches for a smaller string (needle) in a larger string (haystack).

The similarity of strings is computed based on their trigrams, meaning their 3-character substrings: https://en.wikipedia.org/wiki/Trigram.

Trying it out

Here is how to run the examples:

$ cargo run --example similarity color colour
...
0.44444445

$ cargo run --example find_words_iter
bufalo
buffalow
Bungalo
biffalo
buffaloo
huffalo
snuffalo
fluffalo

Usage

Add this to your Cargo.toml:

[dependencies]
trigram = "0.2.2"

and call it like this:

use trigram::similarity;

fn main() {
	println!("{}", similarity(&"rustacean", &"crustacean"));
}

Background

The similarity function in this crate is a reverse-engineered approximation of the similarity function in the Postgresql pg_trgm extension: https://www.postgresql.org/docs/9.1/pgtrgm.html. It gives exactly the same answers in many cases, but may disagree in others (none known). If you find a case where the answers don't match, please file an issue about it!

A good introduction to the Postgres version of this is given on Stack Overflow: https://stackoverflow.com/a/43161051/484529.

Dependencies

~1.5MB
~42K SLoC