#text #ngrams #detect #language #category #extract #sample

bin+lib textcat

Simple library to detect text categories. It can be used to detect the language of a given text

10 releases

0.3.2 Jan 5, 2022
0.3.1 Jan 5, 2022
0.2.0 Jan 5, 2022
0.1.5 Apr 10, 2021
0.1.3 Mar 24, 2021

#1793 in Text processing

36 downloads per month
Used in lingo

MIT license

18KB
422 lines

textcat-rs

Library to extract N-Grams from texts. This is a low level library. Lingo is build on top of this library to detect human languages on texts.

This library provides tools to train with sample texts, extracting N-Grams from texts, create sample and train categories. The trained data can be serialized to be used later. The library also provides tools to detect to which pretained category a given text would be closer to.


lib.rs:

Textcat

Library to extract and categorize texts by ngrams.

Dependencies

~1–2MB
~39K SLoC