8 releases

0.1.9 Oct 24, 2021
0.1.8 Oct 24, 2021

#1983 in Text processing

Download history 1/week @ 2024-02-07 9/week @ 2024-02-14 56/week @ 2024-02-21 26/week @ 2024-02-28

92 downloads per month
Used in 2 crates

Apache-2.0 OR MIT

32KB
825 lines

Goya

goya at crates.io goya at docs.rs

Goya is a Japanese Morphological Analyzer written in Rust.
The main goal is to compile to WebAssembly for morphological analysis in browsers and other JavaScript runtimes. In addition, it can be used with the CLI and Rust.

Try Goya playground. It uses the Goya-wasm from WebWorker.

Getting started

Fetch the latest IPA dictionary

Download the latest IPA dictionary from the official Mecab website and unzip it.

Install Goya CLI

cargo install goya-cli

Compile the IPA dictionary

Compile the IPA dictionary to generate a binary dictionary for morphological analysis. It may take a few minutes.

goya compile /path/to/ipadic

The binary dictionary will be generated in the ~/.goya directory by default. You can change the destination with the --dicdir option.

goya --dicdir=/path/to/generated compile /path/to/ipadic

Run Morphological Analysis

Goya takes input from STDIN. The easiest way is using the echo command and pipe it to the Goya.

$ echo すもももももももものうち | goya
すもも	名詞,一般,*,*,*,*,すもも,スモモ,スモモ
	助詞,係助詞,*,*,*,*,も,モ,モ
もも	名詞,一般,*,*,*,*,もも,モモ,モモ
	助詞,係助詞,*,*,*,*,も,モ,モ
もも	名詞,一般,*,*,*,*,もも,モモ,モモ
	助詞,連体化,*,*,*,*,の,ノ,ノ
うち	名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS

If you specified the --dicdir option when compiling the dictionary, you should also specify it when running the goya command.

echo すもももももももものうち | goya --dicdir=/path/to/generated

Release

cargo release <patch|minor|major> --workspace --no-tag --skip-publish --dependent-version Upgrade
git tag v{{VERSION}}
git push origin v{{VERSION}}

Dependencies

~3.5MB
~79K SLoC