#cc-cedict #morphological #dictionary #chinese

lindera-cc-cedict

A Japanese morphological dictionary for CC-CEDICT

54 releases (30 breaking)

new 0.42.2 Apr 29, 2025
0.41.0 Apr 13, 2025
0.40.1 Mar 27, 2025
0.38.1 Nov 30, 2024
0.12.2 Mar 23, 2022

#2243 in Text processing

Download history 4128/week @ 2025-01-07 3102/week @ 2025-01-14 2242/week @ 2025-01-21 3468/week @ 2025-01-28 2704/week @ 2025-02-04 3268/week @ 2025-02-11 4459/week @ 2025-02-18 5104/week @ 2025-02-25 5384/week @ 2025-03-04 4258/week @ 2025-03-11 4218/week @ 2025-03-18 5906/week @ 2025-03-25 7778/week @ 2025-04-01 6038/week @ 2025-04-08 5892/week @ 2025-04-15 7456/week @ 2025-04-22

28,345 downloads per month
Used in 12 crates (via lindera)

MIT license

140KB
3K SLoC

Lindera CC-CE-DICT

License: MIT Crates.io

Dictionary version

This repository contains CC-CEDICT-MeCab.

Dictionary format

Refer to the manual for details on the unidic-mecab dictionary format and part-of-speech tags.

Index Name (Chinese) Name (English) Notes
0 表面形式 Surface
1 左语境ID Left context ID
2 右语境ID Right context ID
3 成本 Cost
4 词类 Major POS classification
5 词类1 Middle POS classification
6 词类2 Small POS classification
7 词类3 Fine POS classification
8 併音 pinyin
9 繁体字 traditional
10 簡体字 simplified
11 定义 definition

User dictionary format (CSV)

Simple version

Index Name (Japanese) Name (English) Notes
0 表面形式 Surface
1 词类 Major POS classification
2 併音 pinyin

Detailed version

Index Name (Japanese) Name (English) Notes
0 表面形式 Surface
1 左语境ID Left context ID
2 右语境ID Right context ID
3 成本 Cost
4 词类 POS
5 词类1 POS subcategory 1
6 词类2 POS subcategory 2
7 词类3 POS subcategory 3
8 併音 pinyin
9 繁体字 traditional
10 簡体字 simplified
11 定义 definition
12 - - After 12, it can be freely expanded.

API reference

The API reference is available. Please see following URL:

Dependencies

~12–23MB
~380K SLoC