#nlp #chinese #segmenation

jieba-rs

The Jieba Chinese Word Segmentation Implemented in Rust

41 releases

new 0.7.0 Apr 16, 2024
0.6.8 Jul 16, 2023
0.6.7 Oct 3, 2022
0.6.6 Nov 9, 2021
0.2.3 Jul 6, 2018

#70 in Text processing

Download history 2884/week @ 2024-01-01 3056/week @ 2024-01-08 2981/week @ 2024-01-15 2919/week @ 2024-01-22 3256/week @ 2024-01-29 2847/week @ 2024-02-05 3154/week @ 2024-02-12 2770/week @ 2024-02-19 2682/week @ 2024-02-26 2756/week @ 2024-03-04 2957/week @ 2024-03-11 4475/week @ 2024-03-18 6433/week @ 2024-03-25 5727/week @ 2024-04-01 5054/week @ 2024-04-08 4480/week @ 2024-04-15

22,400 downloads per month
Used in 18 crates (13 directly)

MIT license

4.5MB
2K SLoC

jieba-rs

GitHub Actions codecov Crates.io docs.rs

🚀 Help me to become a full-time open-source developer by sponsoring me on GitHub

The Jieba Chinese Word Segmentation Implemented in Rust

Installation

Add it to your Cargo.toml:

[dependencies]
jieba-rs = "0.6"

then you are good to go. If you are using Rust 2015 you have to extern crate jieba_rs to your crate root as well.

Example

use jieba_rs::Jieba;

fn main() {
    let jieba = Jieba::new();
    let words = jieba.cut("我们中出了一个叛徒", false);
    assert_eq!(words, vec!["我们", "", "", "", "一个", "叛徒"]);
}

Enabling Additional Features

  • default-dict feature enables embedded dictionary, this features is enabled by default
  • tfidf feature enables TF-IDF keywords extractor
  • textrank feature enables TextRank keywords extractor
[dependencies]
jieba-rs = { version = "0.6", features = ["tfidf", "textrank"] }

Run benchmark

cargo bench --all-features

Benchmark: Compare with cppjieba

jieba-rs bindings

License

This work is released under the MIT license. A copy of the license is provided in the LICENSE file.

Dependencies

~3–5MB
~90K SLoC