#text #ngrams #detect #language #category #extract #sample

bin+lib textcat

Simple library to detect text categories. It can be used to detect the language of a given text

10 releases

0.3.2 Jan 5, 2022
0.3.1 Jan 5, 2022
0.2.0 Jan 5, 2022
0.1.5 Apr 10, 2021
0.1.3 Mar 24, 2021

#1532 in Text processing

Download history 9/week @ 2024-02-25 7/week @ 2024-03-10 125/week @ 2024-03-31

132 downloads per month
Used in lingo

MIT license

18KB
422 lines

textcat-rs

Library to extract N-Grams from texts. This is a low level library. Lingo is build on top of this library to detect human languages on texts.

This library provides tools to train with sample texts, extracting N-Grams from texts, create sample and train categories. The trained data can be serialized to be used later. The library also provides tools to detect to which pretained category a given text would be closer to.


lib.rs:

Textcat

Library to extract and categorize texts by ngrams.

Dependencies

~1.2–2MB
~41K SLoC