10 releases
0.3.2 | Jan 5, 2022 |
---|---|
0.3.1 | Jan 5, 2022 |
0.2.0 | Jan 5, 2022 |
0.1.5 | Apr 10, 2021 |
0.1.3 | Mar 24, 2021 |
#1793 in Text processing
36 downloads per month
Used in lingo
18KB
422 lines
textcat-rs
Library to extract N-Grams from texts. This is a low level library. Lingo is build on top of this library to detect human languages on texts.
This library provides tools to train with sample texts, extracting N-Grams from texts, create sample and train categories. The trained data can be serialized to be used later. The library also provides tools to detect to which pretained category a given text would be closer to.
lib.rs
:
Textcat
Library to extract and categorize texts by ngrams.
Dependencies
~1–2MB
~39K SLoC