#nlp #chinese #segmenation

jieba-rs

The Jieba Chinese Word Segmentation Implemented in Rust

32 releases

new 0.6.0 Jul 30, 2020
0.5.1 Jun 26, 2020
0.4.10 Aug 14, 2019
0.4.9 Jul 14, 2019
0.2.3 Jul 6, 2018

#34 in Text processing

Download history 130/week @ 2020-04-14 67/week @ 2020-04-21 55/week @ 2020-04-28 1422/week @ 2020-05-05 428/week @ 2020-05-12 182/week @ 2020-05-19 370/week @ 2020-05-26 317/week @ 2020-06-02 342/week @ 2020-06-09 149/week @ 2020-06-16 163/week @ 2020-06-23 205/week @ 2020-06-30 197/week @ 2020-07-07 358/week @ 2020-07-14 209/week @ 2020-07-21 211/week @ 2020-07-28

1,476 downloads per month
Used in 11 crates (6 directly)

MIT license

4.5MB
2K SLoC

jieba-rs

GitHub Actions codecov Crates.io docs.rs

The Jieba Chinese Word Segmentation Implemented in Rust

Installation

Add it to your Cargo.toml:

[dependencies]
jieba-rs = "0.6"

then you are good to go. If you are using Rust 2015 you have to extern crate jieba_rs to your crate root as well.

Example

use jieba_rs::Jieba;

fn main() {
    let jieba = Jieba::new();
    let words = jieba.cut("我们中出了一个叛徒", false);
    assert_eq!(words, vec!["我们", "", "", "", "一个", "叛徒"]);
}

Enabling Additional Features

  • default-dict feature enables embedded dictionary, this features is enabled by default
  • tfidf feature enables TF-IDF keywords extractor
  • textrank feature enables TextRank keywords extractor
[dependencies]
jieba-rs = { version = "0.6", features = ["tfidf", "textrank"] }

Run benchmark

cargo bench --all-features

Benchmark: Compare with cppjieba

jieba-rs bindings

License

This work is released under the MIT license. A copy of the license is provided in the LICENSE file.

Dependencies