#japanese #sentence #tokenizer #nlp #text

saku

A simple yet efficient rule-based Japanese Sentence Tokenizer

6 releases

0.1.6 Dec 16, 2021
0.1.5 Dec 15, 2021

#1734 in Text processing

Download history 3/week @ 2024-02-25 57/week @ 2024-03-31

57 downloads per month

MIT license

12KB
264 lines

Saku: Japanese Sentence Tokenizer

Saku is a library for splitting Japanese text into sentences based on hand-made rules written in Rust.
"割く(saku)" means "spliting something" in Japanese.

This library is named after a Japanese VTuber Saku Sasaki / 笹木咲.

This is the repository for original Rust implementations.

No runtime deps