6 stable releases
1.3.2 | Feb 7, 2023 |
---|---|
1.3.1 | Sep 8, 2022 |
1.3.0 | Aug 7, 2022 |
1.2.0 | Jul 31, 2022 |
#114 in Algorithms
38 downloads per month
Used in tantivy-analysis-contrib
435KB
10K
SLoC
Rust phonetic
This is a rust port of v1.15 Apache commons-codec's phonetic algorithms.
Algorithms
Currently, there are :
- Caverphone 1
- Caverphone 2
- Cologne
- Daitch Mokotoff Soundex
- Double Metaphone
- Match Rating Approach
- Metaphone
- NYSIIS
- Refined Soundex
- Soundex
- Beider-Morse
Please note that most of these algorithms are design for ASCII, and they are usually design for certain use case (eg. english names, ...etc).
Examples
Beider-Morse
fn main() -> Result<(), rphonetic::PhoneticError> {
use std::path::PathBuf;
use rphonetic::{BeiderMorseBuilder, ConfigFiles, Encoder};
let config_files = ConfigFiles::new(&PathBuf::from("./test_assets/cc-rules/"))?;
let builder = BeiderMorseBuilder::new(&config_files);
let beider_morse = builder.build();
assert_eq!(beider_morse.encode("Van Helsing"),"(Ylznk|ilzn|ilznk|xilzn|xilznk)-(banilznk|bonilznk|fYnYlznk|fYnilznk|fanYlznk|fanilznk|fonYlznk|fonilznk|vYnYlznk|vYnilznk|vanYlznk|vaniilznk|vanilzn|vanilznk|vonYlznk|voniilznk|vonilzn|vonilznk)");
Ok(())
}
Caverphone 1 & 2
fn main() {
use rphonetic::{Caverphone1, Encoder};
let caverphone = Caverphone1;
assert_eq!(caverphone.encode("Thompson"), "TMPSN1");
}
fn main() {
use rphonetic::{Caverphone2, Encoder};
let caverphone = Caverphone2;
assert_eq!(caverphone.encode("Thompson"), "TMPSN11111");
}
Cologne
fn main() {
use rphonetic::{Cologne, Encoder};
let cologne = Cologne;
assert_eq!(cologne.encode("m\u{00FC}ller"), "657");
}
Daitch-Mokotoff
fn main() -> Result<(), rphonetic::PhoneticError> {
use rphonetic::{DaitchMokotoffSoundex, DaitchMokotoffSoundexBuilder, Encoder};
const COMMONS_CODEC_RULES: &str = include_str!("./rules/dmrules.txt");
let encoder = DaitchMokotoffSoundexBuilder::with_rules(COMMONS_CODEC_RULES).build()?;
assert_eq!(encoder.soundex("Rosochowaciec"), "944744|944745|944754|944755|945744|945745|945754|945755");
Ok(())
}
Double Metaphone
fn main() {
use rphonetic::{DoubleMetaphone, Encoder};
let double_metaphone = DoubleMetaphone::default();
assert_eq!(double_metaphone.encode("jumped"), "JMPT");
ssert_eq!(double_metaphone.encode_alternate("jumped"), "AMPT");
}
Match Rating Approach
fn main() {
use rphonetic::{Encoder, MatchRatingApproach};
let match_rating = MatchRatingApproach;
assert_eq!(match_rating.encode("Smith"), "SMTH");
}
Metaphone
fn main() {
use rphonetic::{Encoder, Metaphone};
let metaphone = Metaphone::default();
assert_eq!(metaphone.encode("Joanne"), "JN");
}
Nysiis
fn main() {
use rphonetic::{Nysiis, Encoder};
// Strict
let nysiis = Nysiis::default();
assert_eq!(nysiis.encode("WESTERLUND"),"WASTAR");
// Not strict
let nysiis = Nysiis::new(false);
assert_eq!(nysiis.encode("WESTERLUND"),"WASTARLAD");
}
Refined Soundex
fn main() {
use rphonetic::{Encoder, RefinedSoundex};
let refined_soundex = RefinedSoundex::default();
assert_eq!(refined_soundex.encode("jumped"), "J408106");
}
Soundex
fn main() {
use rphonetic::{Encoder, Soundex};
let soundex = Soundex::default();
assert_eq!(soundex.encode("jumped"), "J513");
}
Benchmarking
Benchmarking use criterion.
They were done on an Intel® Core™ i7-4720HQ with 16GB RAM.
To run benches against main
baseline :
cargo bench --bench benchmark -- --baseline main
To replace main
baseline :
cargo bench --bench benchmark -- --save-baseline main
Do not run Criterion benches on CI .
Dependencies
~2.5–3.5MB
~86K SLoC