2 releases

0.1.1 Mar 4, 2023
0.1.0 Dec 23, 2022

#1308 in Text processing

43 downloads per month

MPL-2.0 license

41KB
905 lines

furigana

Crates.io docs.rs Crates.io GitHub

furigana is a Rust library that contains functionality for correctly mapping furigana to a word given a reading, and optionally kanji reading data.

Usage

for mapping in furigana::map_naive("物の怪", "もののけ") {
    println!("{mapping}");
}

prints out the following mappings:

のけ

もの

Only the first reading is actually valid, but based only on a word and its reading there's no way to know that.

If given information about kanji readings (for example, from KANJIDIC2), furigana::map is able to grade the potential mappings:

let mut kanji_to_readings = HashMap::new();
kanji_to_readings.insert("".to_string(), vec!["もの".to_string()]);
kanji_to_readings.insert("".to_string(), vec!["".to_string()]);
let mapping = furigana::map("物の怪", "もののけ", &kanji_to_readings)
    .into_iter()
    .max_by_key(|f| f.accuracy)
    .unwrap();
println!("{mapping}");

assigns a high accuracy value to the correct reading and a low value to the other one, so that only the correct mapping is printed:

もの

Notes

  • The algorithm used is recursive and not optimised, so it may be inefficient for long, kanji-heavy inputs.

  • If the library fails to produce the correct mapping, or if its accuracy is lower than an incorrect mapping's, a GitHub issue is much appreciated!

License

Licensed under the Mozilla Public License Version 2.0.

No runtime deps