28 releases (12 breaking)
0.23.0 | Feb 23, 2023 |
---|---|
0.21.0 | Jan 22, 2023 |
0.19.2 | Dec 27, 2022 |
0.18.0 | Oct 26, 2022 |
0.12.2 | Mar 23, 2022 |
#220 in Text processing
11,422 downloads per month
Used in 14 crates
(2 directly)
68KB
1.5K
SLoC
Lindera CC-CEDICT Builder
CC-CEDICT dictionary builder for Lindera.
Dictionary format
Refer to the manual for details on the unidic-mecab dictionary format and part-of-speech tags.
Index | Name (Chinese) | Name (English) | Notes |
---|---|---|---|
0 | 表面形式 | Surface | |
1 | 左语境ID | Left context ID | |
2 | 右语境ID | Right context ID | |
3 | 成本 | Cost | |
4 | 词类 | Major POS classification | |
5 | 词类1 | Middle POS classification | |
6 | 词类2 | Small POS classification | |
7 | 词类3 | Fine POS classification | |
8 | 併音 | pinyin | |
9 | 繁体字 | traditional | |
10 | 簡体字 | simplified | |
11 | 定义 | definition |
User dictionary format (CSV)
Simple version
Index | Name (Japanese) | Name (English) | Notes |
---|---|---|---|
0 | 表面形式 | Surface | |
1 | 词类 | Major POS classification | |
2 | 併音 | pinyin |
Detailed version
Index | Name (Japanese) | Name (English) | Notes |
---|---|---|---|
0 | 表面形式 | Surface | |
1 | 左语境ID | Left context ID | |
2 | 右语境ID | Right context ID | |
3 | 成本 | Cost | |
4 | 词类 | POS | |
5 | 词类1 | POS subcategory 1 | |
6 | 词类2 | POS subcategory 2 | |
7 | 词类3 | POS subcategory 3 | |
8 | 併音 | pinyin | |
9 | 繁体字 | traditional | |
10 | 簡体字 | simplified | |
11 | 定义 | definition | |
12 | - | - | After 12, it can be freely expanded. |
How to use CC-CEDICT dictionary
For more details about lindera
command, please refer to the following URL:
API reference
The API reference is available. Please see following URL:
Dependencies
~10MB
~251K SLoC