3 unstable releases
new 0.2.1 | Mar 2, 2025 |
---|---|
0.2.0 | Feb 14, 2025 |
0.1.0 | Jan 31, 2025 |
#812 in Text processing
243 downloads per month
270KB
5.5K
SLoC
Alphabet Detector
Detects 387 alphabets in 170 scripts
one spoken language can be written in multiple scripts, so it will be detected as a different alphabet/language
look at the alphabet.rs to understand what languages have already defined alphabets. Some of them need validation
Separates words in text (from iterator CharIndices
), and detects language of words by used alphabets (chars).
Warning: can return words with chars from the Unicode private area (for Yoruba
or Nuer
language)
Dependencies
~3–4MB
~68K SLoC