1 unstable release

0.1.0 Jan 2, 2025

#805 in Machine learning

Download history 55/week @ 2025-01-10 52/week @ 2025-01-17 69/week @ 2025-01-24 234/week @ 2025-01-31 110/week @ 2025-02-07 144/week @ 2025-02-14 25/week @ 2025-02-21 63/week @ 2025-02-28 21/week @ 2025-03-07 446/week @ 2025-03-21 171/week @ 2025-03-28 122/week @ 2025-04-04 6/week @ 2025-04-11 8/week @ 2025-04-18

214 downloads per month

Apache-2.0

16KB
438 lines

Tocken

CI crates.io docs.rs

Tokenizer implemented in Rust.

This tokenizer is based on Lucene's EnglishAnalyzer.

Usage

  • as a library: check the main.rs file and docs.
  • as a CLI:
    • cargo r -r --help
    • cargo r -r -- -i wiki.txt -o wiki_tocken_f10.json -f 10

Dependencies

~10MB
~168K SLoC