1 unstable release

0.7.1 Dec 25, 2024

#34 in #segmentation

Download history 1435/week @ 2025-03-12 2184/week @ 2025-03-19 2557/week @ 2025-03-26 2524/week @ 2025-04-02 1639/week @ 2025-04-09 2290/week @ 2025-04-16 2960/week @ 2025-04-23 1513/week @ 2025-04-30 2994/week @ 2025-05-07 4144/week @ 2025-05-14 3059/week @ 2025-05-21 4385/week @ 2025-05-28 4146/week @ 2025-06-04 4752/week @ 2025-06-11 5215/week @ 2025-06-18 4075/week @ 2025-06-25

18,995 downloads per month
Used in 12 crates (via jieba-rs)

MIT license

205KB

jieba-rs

GitHub Actions codecov Crates.io docs.rs

🚀 Help me to become a full-time open-source developer by sponsoring me on GitHub

The Jieba Chinese Word Segmentation Implemented in Rust

Installation

Add it to your Cargo.toml:

[dependencies]
jieba-rs = "0.7"

then you are good to go. If you are using Rust 2015 you have to extern crate jieba_rs to your crate root as well.

Example

use jieba_rs::Jieba;

fn main() {
    let jieba = Jieba::new();
    let words = jieba.cut("我们中出了一个叛徒", false);
    assert_eq!(words, vec!["我们", "", "", "", "一个", "叛徒"]);
}

Enabling Additional Features

  • default-dict feature enables embedded dictionary, this features is enabled by default
  • tfidf feature enables TF-IDF keywords extractor
  • textrank feature enables TextRank keywords extractor
[dependencies]
jieba-rs = { version = "0.7", features = ["tfidf", "textrank"] }

Run benchmark

cargo bench --all-features

Benchmark: Compare with cppjieba

jieba-rs bindings

License

This work is released under the MIT license. A copy of the license is provided in the LICENSE file.

Dependencies

~440KB