#chinese #morphological #dictionary #builder #cc-cedict

lindera-cc-cedict-builder

A Chinese morphological dictionary builder for CC-CEDICT

28 releases (12 breaking)

0.23.0 Feb 23, 2023
0.21.0 Jan 22, 2023
0.19.2 Dec 27, 2022
0.18.0 Oct 26, 2022
0.12.2 Mar 23, 2022

#220 in Text processing

Download history 2031/week @ 2022-11-27 2148/week @ 2022-12-04 2276/week @ 2022-12-11 2321/week @ 2022-12-18 1644/week @ 2022-12-25 1750/week @ 2023-01-01 2072/week @ 2023-01-08 2195/week @ 2023-01-15 2132/week @ 2023-01-22 2305/week @ 2023-01-29 2393/week @ 2023-02-05 2407/week @ 2023-02-12 2674/week @ 2023-02-19 2635/week @ 2023-02-26 2950/week @ 2023-03-05 2881/week @ 2023-03-12

11,422 downloads per month
Used in 14 crates (2 directly)

MIT license

68KB
1.5K SLoC

Lindera CC-CEDICT Builder

License: MIT Join the chat at https://gitter.im/lindera-morphology/lindera

CC-CEDICT dictionary builder for Lindera.

Dictionary format

Refer to the manual for details on the unidic-mecab dictionary format and part-of-speech tags.

Index Name (Chinese) Name (English) Notes
0 表面形式 Surface
1 左语境ID Left context ID
2 右语境ID Right context ID
3 成本 Cost
4 词类 Major POS classification
5 词类1 Middle POS classification
6 词类2 Small POS classification
7 词类3 Fine POS classification
8 併音 pinyin
9 繁体字 traditional
10 簡体字 simplified
11 定义 definition

User dictionary format (CSV)

Simple version

Index Name (Japanese) Name (English) Notes
0 表面形式 Surface
1 词类 Major POS classification
2 併音 pinyin

Detailed version

Index Name (Japanese) Name (English) Notes
0 表面形式 Surface
1 左语境ID Left context ID
2 右语境ID Right context ID
3 成本 Cost
4 词类 POS
5 词类1 POS subcategory 1
6 词类2 POS subcategory 2
7 词类3 POS subcategory 3
8 併音 pinyin
9 繁体字 traditional
10 簡体字 simplified
11 定义 definition
12 - - After 12, it can be freely expanded.

How to use CC-CEDICT dictionary

For more details about lindera command, please refer to the following URL:

API reference

The API reference is available. Please see following URL:

Dependencies

~10MB
~251K SLoC