#alphabet #language #detection #word #nlp #separator #detect

bin+lib alphabet_detector

Natural language alphabet detection library

3 unstable releases

new 0.2.1 Mar 2, 2025
0.2.0 Feb 14, 2025
0.1.0 Jan 31, 2025

#812 in Text processing

Download history 104/week @ 2025-01-27 25/week @ 2025-02-03 105/week @ 2025-02-10 9/week @ 2025-02-17

243 downloads per month

MIT/Apache

270KB
5.5K SLoC

Alphabet Detector

Crate API

Detects 387 alphabets in 170 scripts

one spoken language can be written in multiple scripts, so it will be detected as a different alphabet/language

look at the alphabet.rs to understand what languages have already defined alphabets. Some of them need validation

Separates words in text (from iterator CharIndices), and detects language of words by used alphabets (chars).

Warning: can return words with chars from the Unicode private area (for Yoruba or Nuer language)

Dependencies

~3–4MB
~68K SLoC