#japanese #morphological #dictionary #builder #unidic

lindera-unidic-builder

A Japanese morphological dictionary builder for UniDic

38 releases (17 breaking)

0.23.0 Feb 23, 2023
0.21.0 Jan 22, 2023
0.19.2 Dec 27, 2022
0.18.0 Oct 26, 2022
0.3.2 Feb 20, 2020

#128 in Text processing

Download history 2288/week @ 2022-11-30 2184/week @ 2022-12-07 2357/week @ 2022-12-14 2038/week @ 2022-12-21 1605/week @ 2022-12-28 1892/week @ 2023-01-04 2138/week @ 2023-01-11 2284/week @ 2023-01-18 2056/week @ 2023-01-25 2308/week @ 2023-02-01 2358/week @ 2023-02-08 2668/week @ 2023-02-15 2655/week @ 2023-02-22 2759/week @ 2023-03-01 2953/week @ 2023-03-08 2450/week @ 2023-03-15

11,300 downloads per month
Used in 14 crates (2 directly)

MIT license

69KB
1.5K SLoC

Lindera UniDic Builder

License: MIT Join the chat at https://gitter.im/lindera-morphology/lindera

UniDic builder for Lindera.

Dictionary version

This project supports UniDic 2.1.2. See detail of UniDic .

Dictionary format

Refer to the manual for details on the unidic-mecab dictionary format and part-of-speech tags.

Index Name (Japanese) Name (English) Notes
0 表層形 Surface
1 左文脈ID Left context ID
2 右文脈ID Right context ID
3 コスト Cost
4 品詞大分類 Major POS classification
5 品詞中分類 Middle POS classification
6 品詞小分類 Small POS classification
7 品詞細分類 Fine POS classification
8 活用型 Conjugation form
9 活用形 Conjugation type
10 語彙素読み Lexeme reading
11 語彙素(語彙素表記 + 語彙素細分類) Lexeme
12 書字形出現形 Orthography appearance type
13 発音形出現形 Pronunciation appearance type
14 書字形基本形 Orthography basic type
15 発音形基本形 Pronunciation basic type
16 語種 Word type
17 語頭変化型 Prefix of a word form
18 語頭変化形 Prefix of a word type
19 語末変化型 Suffix of a word form
20 語末変化形 Suffix of a word type

User dictionary format (CSV)

Simple version

Index Name (Japanese) Name (English) Notes
0 表層形 Surface
1 品詞大分類 Major POS classification
2 語彙素読み Lexeme reading

Detailed version

Index Name (Japanese) Name (English) Notes
0 表層形 Surface
1 左文脈ID Left context ID
2 右文脈ID Right context ID
3 コスト Cost
4 品詞大分類 Major POS classification
5 品詞中分類 Middle POS classification
6 品詞小分類 Small POS classification
7 品詞細分類 Fine POS classification
8 活用型 Conjugation form
9 活用形 Conjugation type
10 語彙素読み Lexeme reading
11 語彙素(語彙素表記 + 語彙素細分類) Lexeme
12 書字形出現形 Orthography appearance type
13 発音形出現形 Pronunciation appearance type
14 書字形基本形 Orthography basic type
15 発音形基本形 Pronunciation basic type
16 語種 Word type
17 語頭変化型 Prefix of a word form
18 語頭変化形 Prefix of a word type
19 語末変化型 Suffix of a word form
20 語末変化形 Suffix of a word type
21 - - After 21, it can be freely expanded.

How to use IPADIC dictionary

For more details about lindera command, please refer to the following URL:

API reference

The API reference is available. Please see following URL:

Dependencies

~10MB
~252K SLoC