16 stable releases
new 3.0.1 | Dec 4, 2024 |
---|---|
3.0.0 | Oct 15, 2024 |
2.2.1 | Aug 11, 2024 |
2.2.0 | Apr 5, 2024 |
1.2.0 | Jul 31, 2022 |
#83 in Text processing
500 downloads per month
Used in tantivy-analysis-contrib
450KB
10K
SLoC
Rust phonetic
This is a rust port of v1.15 Apache commons-codec's phonetic algorithms.
Algorithms
Currently, there are :
- Beider-Morse
- Caverphone 1
- Caverphone 2
- Cologne
- Daitch Mokotoff Soundex
- Match Rating Approach
- Metaphone
- Metaphone (Double)
- NYSIIS
- Phonex
- Soundex
- Soundex (Refined)
Please note that most of these algorithms are design for the latin alphabet, and they are usually design for certain use case (eg. english names / english dictonary words, ...etc).
Examples
Beider-Morse
fn main() -> Result<(), rphonetic::PhoneticError> {
use std::path::PathBuf;
use rphonetic::{BeiderMorseBuilder, ConfigFiles, Encoder};
let config_files = ConfigFiles::new(&PathBuf::from("./test_assets/cc-rules/"))?;
let builder = BeiderMorseBuilder::new(&config_files);
let beider_morse = builder.build();
assert_eq!(beider_morse.encode("Van Helsing"),"(Ylznk|ilzn|ilznk|xilzn|xilznk)-(banilznk|bonilznk|fYnYlznk|fYnilznk|fanYlznk|fanilznk|fonYlznk|fonilznk|vYnYlznk|vYnilznk|vanYlznk|vaniilznk|vanilzn|vanilznk|vonYlznk|voniilznk|vonilzn|vonilznk)");
Ok(())
}
Caverphone 1 & 2
fn main() {
use rphonetic::{Caverphone1, Encoder};
let caverphone = Caverphone1;
assert_eq!(caverphone.encode("Thompson"), "TMPSN1");
}
fn main() {
use rphonetic::{Caverphone2, Encoder};
let caverphone = Caverphone2;
assert_eq!(caverphone.encode("Thompson"), "TMPSN11111");
}
Cologne
fn main() {
use rphonetic::{Cologne, Encoder};
let cologne = Cologne;
assert_eq!(cologne.encode("m\u{00FC}ller"), "657");
}
Daitch-Mokotoff
fn main() -> Result<(), rphonetic::PhoneticError> {
use rphonetic::{DaitchMokotoffSoundex, DaitchMokotoffSoundexBuilder, Encoder};
const COMMONS_CODEC_RULES: &str = include_str!("./rules/dmrules.txt");
let encoder = DaitchMokotoffSoundexBuilder::with_rules(COMMONS_CODEC_RULES).build()?;
assert_eq!(encoder.soundex("Rosochowaciec"), "944744|944745|944754|944755|945744|945745|945754|945755");
Ok(())
}
Match Rating Approach
fn main() {
use rphonetic::{Encoder, MatchRatingApproach};
let match_rating = MatchRatingApproach;
assert_eq!(match_rating.encode("Smith"), "SMTH");
}
Metaphone
fn main() {
use rphonetic::{Encoder, Metaphone};
let metaphone = Metaphone::default();
assert_eq!(metaphone.encode("Joanne"), "JN");
}
Metaphone (Double)
fn main() {
use rphonetic::{DoubleMetaphone, Encoder};
let double_metaphone = DoubleMetaphone::default();
assert_eq!(double_metaphone.encode("jumped"), "JMPT");
assert_eq!(double_metaphone.encode_alternate("jumped"), "AMPT");
}
Phonex
fn main() {
use rphonetic::{Phonex, Encoder};
// Strict
let phonex = Phonex::default();
assert_eq!(phonex.encode("William"),"W450");
}
Nysiis
fn main() {
use rphonetic::{Nysiis, Encoder};
// Strict
let nysiis = Nysiis::default();
assert_eq!(nysiis.encode("WESTERLUND"),"WASTAR");
// Not strict
let nysiis = Nysiis::new(false);
assert_eq!(nysiis.encode("WESTERLUND"),"WASTARLAD");
}
Soundex
fn main() {
use rphonetic::{Encoder, Soundex};
let soundex = Soundex::default();
assert_eq!(soundex.encode("jumped"), "J513");
}
Soundex (Refined)
fn main() {
use rphonetic::{Encoder, RefinedSoundex};
let refined_soundex = RefinedSoundex::default();
assert_eq!(refined_soundex.encode("jumped"), "J408106");
}
Benchmarking
Benchmarking use criterion.
They were done on an Intel® Core™ i7-4720HQ with 16GB RAM.
To run benches against main
baseline :
cargo bench --bench benchmark -- --baseline main
To replace main
baseline :
cargo bench --bench benchmark -- --save-baseline main
Do not run Criterion benches on CI .
Dependencies
~3.5–5.5MB
~98K SLoC