4 releases (breaking)

new 0.4.0 May 1, 2025
0.3.0 Apr 25, 2025
0.2.0 Apr 23, 2025
0.1.0 Apr 14, 2025

#394 in Text processing

Download history 77/week @ 2025-04-09 34/week @ 2025-04-16 263/week @ 2025-04-23

374 downloads per month
Used in langram_train

MIT/Apache

53KB
1K SLoC

Langram - the most accurate language detection library

Crate API

308 ScriptLanguages (187 models + 121 single language scripts)

One language can be written in multiple scripts, so it will be detected as a different ScriptLanguage (language + script)

Uses alphabet_detector as a word separator + language prefilter.

Based on char (not word) n-gram language model modified algorithm.

This library is a complete rewrite of Lingua: 5x faster, more accuracy, more languages, etc.

Accuracy report

Comparison with other language detectors

Setup

To use it, you need to patch langram_models in Cargo.toml:

  • From Git:
[patch.crates-io]
langram_models = { git = "https://github.com/RoDmitry/langram_models.git" }
  • From predownloaded copy:
[patch.crates-io]
langram_models = { path = "../langram_models" }

Which is more advanced and allows you to remove model ngrams, so that final executable would be lighter.

Dependencies

~14MB
~404K SLoC