#nlp #chinese #segmenation

jieba-rs

The Jieba Chinese Word Segmentation Implemented in Rust

28 releases

✓ Uses Rust 2018 edition

0.5.0 May 14, 2020
0.4.10 Aug 14, 2019
0.4.9 Jul 14, 2019
0.2.5 Oct 30, 2018
0.2.3 Jul 6, 2018

#44 in Text processing

Download history 39/week @ 2020-02-09 64/week @ 2020-02-16 169/week @ 2020-02-23 27/week @ 2020-03-01 63/week @ 2020-03-08 35/week @ 2020-03-15 154/week @ 2020-03-22 18/week @ 2020-03-29 101/week @ 2020-04-05 141/week @ 2020-04-12 62/week @ 2020-04-19 53/week @ 2020-04-26 1064/week @ 2020-05-03 746/week @ 2020-05-10 114/week @ 2020-05-17 361/week @ 2020-05-24

871 downloads per month
Used in 11 crates (6 directly)

MIT license

4.5MB
2K SLoC

jieba-rs

Build Status codecov Crates.io docs.rs

The Jieba Chinese Word Segmentation Implemented in Rust

Installation

Add it to your Cargo.toml:

[dependencies]
jieba-rs = "0.5"

then you are good to go. If you are using Rust 2015 you have to extern crate jieba_rs to your crate root as well.

Example

use jieba_rs::Jieba;

fn main() {
    let jieba = Jieba::new();
    let words = jieba.cut("我们中出了一个叛徒", false);
    assert_eq!(words, vec!["我们", "", "", "", "一个", "叛徒"]);
}

Enabling Additional Features

  • default-dict feature enables embedded dictionary, this features is enabled by default
  • tfidf feature enables TF-IDF keywords extractor
  • textrank feature enables TextRank keywords extractor
[dependencies]
jieba-rs = { version = "0.5", features = ["tfidf", "textrank"] }

Run benchmark

cargo bench --all-features

Benchmark: Compare with cppjieba

License

This work is released under the MIT license. A copy of the license is provided in the LICENSE file.

Dependencies