#spellcheck

symspell

Spelling correction & Fuzzy search

9 unstable releases (4 breaking)

Uses old Rust 2015

0.4.1 Apr 16, 2020
0.4.0 Feb 3, 2020
0.3.0 Dec 20, 2019
0.2.0 Nov 11, 2019
0.0.2 Apr 13, 2018

#511 in Algorithms

Download history 20/week @ 2021-09-28 16/week @ 2021-10-05 24/week @ 2021-10-12 11/week @ 2021-10-19 13/week @ 2021-10-26 17/week @ 2021-11-02 10/week @ 2021-11-09 7/week @ 2021-11-16 11/week @ 2021-11-23 16/week @ 2021-11-30 8/week @ 2021-12-07 11/week @ 2021-12-14 5/week @ 2021-12-21 6/week @ 2021-12-28 23/week @ 2022-01-04 14/week @ 2022-01-11

51 downloads per month
Used in json-surf

MIT license

61KB
1K SLoC

Documentation

SymSpell

Rust implementation of brilliant SymSpell originally written in C# by @wolfgarbe.

Usage

extern crate symspell;

use symspell::{AsciiStringStrategy, SymSpell, Verbosity};

fn main() {
    let mut symspell: SymSpell<AsciiStringStrategy> = SymSpell::default();

    symspell.load_dictionary("data/frequency_dictionary_en_82_765.txt", 0, 1, " ");
    symspell.load_bigram_dictionary(
      "./data/frequency_bigramdictionary_en_243_342.txt",
      0,
      2,
      " "
    );

    let suggestions = symspell.lookup("roket", Verbosity::Top, 2);
    println!("{:?}", suggestions);

    let sentence = "whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixtgrade and ins pired him"
    let compound_suggestions = symspell.lookup_compound(sentence, 2);
    println!("{:?}", compound_suggestions);

    let sentence = "whereisthelove";
    let segmented = symspell.word_segmentation(sentence, 2);
    println!("{:?}", segmented);
}

N.B. the dictionary entries have to be lowercase

Advanced Usage

Using Custom Settings

let mut symspell: SymSpell<AsciiStringStrategy> = SymSpellBuilder::default()
    .max_dictionary_edit_distance(2)
    .prefix_length(7)
    .count_threshold(1)
    .build()
    .unwrap()

String Strategy

String strategy is abstraction for string manipulation, for example preprocessing.

There are two strategies included:

  • UnicodeStringStrategy
    • Doesn't do any prepocessing and handles strings as they are.
  • AsciiStringStrategy
    • Transliterates strings into ASCII only characters.
    • Useful when you are working with accented languages and you don't want to care about accents, etc

To configure string strategy just pass it as a type parameter:

let mut ascii_symspell: SymSpell<AsciiStringStrategy> = SymSpell::default();
let mut unicode_symspell: SymSpell<UnicodeStringStrategy> = SymSpell::default();

Javascript Bindings

This crate can be compiled against wasm32 target and exposes a SymSpell Class that can be used from Javascript as follow. Only UnicodeStringStrategy is exported, meaning that if someone wants to manipulate ASCII only strings the dictionary and the sentences must be prepared in advance from JS.

const fs = require('fs');
const rust = require('./pkg');

let dictionary = fs.readFileSync('data/frequency_dictionary_en_82_765.txt');
let sentence = "whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixtgrade and ins pired him";

let symspell = new rust.SymSpell({ max_edit_distance: 2,  prefix_length: 7,  count_threshold: 1});
symspell.load_dictionary(dictionary.buffer, { term_index: 0,  count_index: 1, separator: " "});
symspell.load_bigram_dictionary(bigram_dict.buffer, { term_index: 0,  count_index: 2, separator: " "});
symspell.lookup_compound(sentence, 1);

It can be compiled using wasm-pack (eg. wasm-pack build --release --target nodejs)

Dependencies

~0.7–1.9MB
~47K SLoC