#nlp #chinese #segmenation

jieba-rs

The Jieba Chinese Word Segmentation Implemented in Rust

39 releases

0.6.7 Oct 3, 2022
0.6.6 Nov 9, 2021
0.6.5 Jul 3, 2021
0.6.2 Nov 17, 2020
0.2.3 Jul 6, 2018

#71 in Text processing

Download history 1630/week @ 2022-10-15 2154/week @ 2022-10-22 1549/week @ 2022-10-29 1616/week @ 2022-11-05 1722/week @ 2022-11-12 1590/week @ 2022-11-19 2111/week @ 2022-11-26 2292/week @ 2022-12-03 2186/week @ 2022-12-10 2747/week @ 2022-12-17 2053/week @ 2022-12-24 2340/week @ 2022-12-31 2594/week @ 2023-01-07 2312/week @ 2023-01-14 1961/week @ 2023-01-21 2541/week @ 2023-01-28

9,812 downloads per month
Used in 17 crates (12 directly)

MIT license

4.5MB
2K SLoC

jieba-rs

GitHub Actions codecov Crates.io docs.rs

🚀 Help me to become a full-time open-source developer by sponsoring me on GitHub

The Jieba Chinese Word Segmentation Implemented in Rust

Installation

Add it to your Cargo.toml:

[dependencies]
jieba-rs = "0.6"

then you are good to go. If you are using Rust 2015 you have to extern crate jieba_rs to your crate root as well.

Example

use jieba_rs::Jieba;

fn main() {
    let jieba = Jieba::new();
    let words = jieba.cut("我们中出了一个叛徒", false);
    assert_eq!(words, vec!["我们", "", "", "", "一个", "叛徒"]);
}

Enabling Additional Features

  • default-dict feature enables embedded dictionary, this features is enabled by default
  • tfidf feature enables TF-IDF keywords extractor
  • textrank feature enables TextRank keywords extractor
[dependencies]
jieba-rs = { version = "0.6", features = ["tfidf", "textrank"] }

Run benchmark

cargo bench --all-features

Benchmark: Compare with cppjieba

jieba-rs bindings

License

This work is released under the MIT license. A copy of the license is provided in the LICENSE file.

Dependencies