2 releases
0.1.1 | Jan 18, 2021 |
---|---|
0.1.0 | Feb 8, 2018 |
#2011 in Text processing
597 downloads per month
Used in lingo
28KB
385 lines
stopwords-rs
Stopwords from popular text processing frameworks.
These are high-frequency grammatical words which are usually ignored in information retrieval applications.
lib.rs
:
This library provides stopwords datasets from popular text processing engines.
This could help reproducing results of text analysis pipelines written using different languages and tools.
Usage
[dependencies]
stopwords = "0.1.0"
extern crate stopwords;
use std::collections::HashSet;
use stopwords::{Spark, Language, Stopwords};
fn main() {
let stops: HashSet<_> = Spark::stopwords(Language::English).unwrap().iter().collect();
let mut tokens = vec!("brocolli", "is", "good", "to", "eat");
tokens.retain(|s| !stops.contains(s));
assert_eq!(tokens, vec!("brocolli", "good", "eat"));
}
Dependencies
~240–700KB
~16K SLoC