4 releases (breaking)
new 0.4.0 | May 1, 2025 |
---|---|
0.3.0 | Apr 25, 2025 |
0.2.0 | Apr 23, 2025 |
0.1.0 | Apr 14, 2025 |
#394 in Text processing
374 downloads per month
Used in langram_train
53KB
1K
SLoC
Langram - the most accurate language detection library
308 ScriptLanguages (187 models + 121 single language scripts)
One language can be written in multiple scripts, so it will be detected as a different
ScriptLanguage
(language + script)
Uses alphabet_detector
as a word separator + language prefilter.
Based on char (not word) n-gram language model modified algorithm.
This library is a complete rewrite of Lingua: 5x faster, more accuracy, more languages, etc.
Comparison with other language detectors
Setup
To use it, you need to patch langram_models
in Cargo.toml
:
- From Git:
[patch.crates-io]
langram_models = { git = "https://github.com/RoDmitry/langram_models.git" }
- From predownloaded copy:
[patch.crates-io]
langram_models = { path = "../langram_models" }
Which is more advanced and allows you to remove model ngrams, so that final executable would be lighter.
Dependencies
~14MB
~404K SLoC