2 releases
0.1.1 | Mar 4, 2023 |
---|---|
0.1.0 | Dec 23, 2022 |
#1637 in Text processing
41KB
905 lines
furigana
furigana
is a Rust library that contains functionality for correctly mapping furigana to a word given a reading, and optionally kanji reading data.
Usage
for mapping in furigana::map_naive("物の怪", "もののけ") {
println!("{mapping}");
}
prints out the following mappings:
物の怪 物の怪
Only the first reading is actually valid, but based only on a word and its reading there's no way to know that.
If given information about kanji readings (for example, from KANJIDIC2), furigana::map
is able to grade the potential mappings:
let mut kanji_to_readings = HashMap::new();
kanji_to_readings.insert("物".to_string(), vec!["もの".to_string()]);
kanji_to_readings.insert("怪".to_string(), vec!["け".to_string()]);
let mapping = furigana::map("物の怪", "もののけ", &kanji_to_readings)
.into_iter()
.max_by_key(|f| f.accuracy)
.unwrap();
println!("{mapping}");
assigns a high accuracy value to the correct reading and a low value to the other one, so that only the correct mapping is printed:
物の怪
Notes
-
The algorithm used is recursive and not optimised, so it may be inefficient for long, kanji-heavy inputs.
-
If the library fails to produce the correct mapping, or if its accuracy is lower than an incorrect mapping's, a GitHub issue is much appreciated!
License
Licensed under the Mozilla Public License Version 2.0.