#sentence #japanese #tokenizer #nlp #text

saku

A simple yet efficient rule-based Japanese Sentence Tokenizer

6 releases

0.1.6 Dec 16, 2021
0.1.5 Dec 15, 2021

#1945 in Text processing

MIT license

12KB
264 lines

Saku: Japanese Sentence Tokenizer

Saku is a library for splitting Japanese text into sentences based on hand-made rules written in Rust.
"割く(saku)" means "spliting something" in Japanese.

This library is named after a Japanese VTuber Saku Sasaki / 笹木咲.

This is the repository for original Rust implementations.

No runtime deps