3 releases (breaking)
0.3.0 | Sep 24, 2024 |
---|---|
0.2.0 | Jun 24, 2024 |
0.1.1 | May 11, 2024 |
0.1.0 |
|
#1105 in Text processing
122 downloads per month
58KB
2K
SLoC
TiniestSegmenter
A port of TinySegmenter written in pure, safe rust with no dependencies. You can find bindings for both Rust and Python.
TinySegmenter is an n-gram word tokenizer for Japanese text originally built by Taku Kudo (2008).
Usage
Add the crate to your project: cargo add tiniestsegmenter
.
use tiniestsegmenter as ts;
fn main() {
let tokens: Vec<&str> = ts::tokenize("ジャガイモが好きです。");
}