6 releases
0.2.3 | Jun 24, 2023 |
---|---|
0.2.2 | Jun 24, 2023 |
0.1.1 | Jun 4, 2023 |
#1684 in Text processing
Used in 2 crates
350KB
13K
SLoC
wordfreq
This crate is a yet another Rust port of wordfreq, allowing you to look up the frequencies of words in many languages.
Documentation
Licensing
Source code is licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
lib.rs
:
wordfreq
This crate is a yet another Rust port of Python's wordfreq, allowing you to look up the frequencies of words in many languages.
Note that this crate provides only the algorithms (including hardcoded standardization) and does not contain the models. Use wordfreq-model to easily load distributed models. We recommend to see the documentation for quick start.
How to create instances without wordfreq-model
If you do not desire automatic model downloads using wordfreq-model, you can create instances directly from the actual model files placed here. The model files describe words and their frequencies in the text format:
<word1> <freq1>
<word2> <freq2>
<word3> <freq3>
...
You can create instances as follows:
use approx::assert_relative_eq;
use wordfreq::WordFreq;
let word_weight_text = "las 10\nvegas 30\n";
let word_weights = wordfreq::word_weights_from_text(word_weight_text.as_bytes())?;
let wf = WordFreq::new(word_weights);
assert_relative_eq!(wf.word_frequency("las"), 0.25);
assert_relative_eq!(wf.word_frequency("vegas"), 0.75);
assert_relative_eq!(wf.word_frequency("Las"), 0.00);
Standardization
If you want to standardize words before looking up their frequencies,
set up an instance of Standardizer
.
use approx::assert_relative_eq;
use wordfreq::WordFreq;
use wordfreq::Standardizer;
let word_weight_text = "las 10\nvegas 30\n";
let word_weights = wordfreq::word_weights_from_text(word_weight_text.as_bytes())?;
let standardizer = Standardizer::new("en")?;
let wf = WordFreq::new(word_weights).standardizer(standardizer);
assert_relative_eq!(wf.word_frequency("Las"), 0.25); // Standardized
Precision errors
Even if the algorithms are the same, the results may differ slightly from the original implementation due to floating point precision.
Unprovided features
This crate is a straightforward port of Python's wordfreq, although some features are not provided:
Dependencies
~7MB
~137K SLoC