8 releases

0.1.7 Sep 7, 2024
0.1.6 Jun 7, 2024
0.1.5 May 24, 2024
0.1.4 Mar 10, 2024
0.1.0 Dec 22, 2023

#944 in Text processing

Download history 54/week @ 2024-09-11 26/week @ 2024-09-18 25/week @ 2024-09-25 6/week @ 2024-10-02 4/week @ 2024-10-09 1/week @ 2024-10-16 1/week @ 2024-10-30 3/week @ 2024-11-06 2/week @ 2024-11-13 2/week @ 2024-11-20 5/week @ 2024-11-27 107/week @ 2024-12-04 109/week @ 2024-12-11 10/week @ 2024-12-18 3/week @ 2024-12-25

231 downloads per month

MIT license

6MB
395 lines

haoxue-dict

A Chinese dictionary and word segmenter.

Dictionary usage

use haoxue_dict::DICTIONARY;

let entry = DICTIONARY.get_entry("你好").unwrap();
assert_eq!(entry.simplified(), "你好");
assert_eq!(entry.pinyin(), "ni3 hao3");
assert_eq!(prettify_pinyin::prettify(entry.pinyin()), "nǐ hǎo");
use haoxue_dict::DICTIONARY;

let entry = DICTIONARY.get_entry("").unwrap();
assert_eq!(entry.traditional(), "");
assert_eq!(entry.pinyin(), "men5");
assert_eq!(prettify_pinyin::prettify(entry.pinyin()), "men");
use haoxue_dict::DICTIONARY;

// 们 is more common than 大学
assert!(DICTIONARY.frequency("") > DICTIONARY.frequency("大学"));

Segmenter usage

use haoxue_dict::{DICTIONARY, DictEntry};
use either::Either;

let segments = DICTIONARY.segment("明天我会去图书馆。")
                .iter()
                .map(|segment| segment.map_left(DictEntry::simplified))
                .collect::<Vec<_>>();
assert_eq!(segments, vec![
    Either::Left("明天"),
    Either::Left(""),
    Either::Left(""),
    Either::Left(""),
    Either::Left("图书馆"),
    Either::Right("")
]);

Feature flags

  • embed-dict: Embed the dictionary in the binary. This is the default feature and adds about 12.4 MiB to the binary size.

Dependencies

~2MB
~29K SLoC