11 unstable releases (5 breaking)
0.7.0 | Jan 17, 2024 |
---|---|
0.6.1 | Aug 28, 2023 |
0.5.0 | Jul 4, 2023 |
0.3.3 | Mar 14, 2023 |
0.1.2 | Nov 22, 2022 |
#560 in Text processing
Used in tantivy-ik
1MB
2K
SLoC
ik-rs
ik-analyzer for Rust
Usage
add to Cargo.toml
[dependencies]
ik-rs = "0.7.0"
Chinese Segment
#[cfg(test)]
mod test {
use ik_rs::core::ik_segmenter::{IKSegmenter, TokenMode};
#[test]
pub fn test_ik() {
let mut ik = IKSegmenter::new();
let text = "中华人民共和国";
let tokens = ik.tokenize(text, TokenMode::INDEX); // TokenMode::SEARCH
let mut token_texts = Vec::new();
for token in tokens.iter() {
println!("{:?}", token);
token_texts.push(token.get_lexeme_text());
}
assert_eq!(
token_texts,
vec![
"中华人民共和国",
"中华人民",
"中华",
"华人",
"人民共和国",
"人民",
"共和国",
"共和",
"国"
]
)
}
}
BenchMark
High performance
cargo bench
ik_tokenize_benchmark time: [19.366 µs 19.572 µs 19.850 µs]
change: [-1.5364% -0.4029% +0.7357%] (p = 0.51 > 0.05)
Usage for Tantivy
use tantivy-ik project
Welcome to rust developer and search engine developer join us, and maintain this project together!
you can PR or submit issue...
and star⭐️ or fork this project to support me!
Dependencies
~2–8MB
~76K SLoC