2 releases

0.1.1 Jan 18, 2021
0.1.0 Feb 8, 2018

#2011 in Text processing

Download history 82/week @ 2024-08-26 59/week @ 2024-09-02 81/week @ 2024-09-09 88/week @ 2024-09-16 80/week @ 2024-09-23 104/week @ 2024-09-30 61/week @ 2024-10-07 83/week @ 2024-10-14 56/week @ 2024-10-21 36/week @ 2024-10-28 32/week @ 2024-11-04 70/week @ 2024-11-11 216/week @ 2024-11-18 131/week @ 2024-11-25 139/week @ 2024-12-02 108/week @ 2024-12-09

597 downloads per month
Used in lingo

MIT license

28KB
385 lines

stopwords-rs Crates.io Build Status

Stopwords from popular text processing frameworks.

These are high-frequency grammatical words which are usually ignored in information retrieval applications.


lib.rs:

This library provides stopwords datasets from popular text processing engines.

This could help reproducing results of text analysis pipelines written using different languages and tools.

Usage

[dependencies]
stopwords = "0.1.0"
extern crate stopwords;

use std::collections::HashSet;
use stopwords::{Spark, Language, Stopwords};

fn main() {
    let stops: HashSet<_> = Spark::stopwords(Language::English).unwrap().iter().collect();
    let mut tokens = vec!("brocolli", "is", "good", "to", "eat");
    tokens.retain(|s| !stops.contains(s));
    assert_eq!(tokens, vec!("brocolli", "good", "eat"));
}

Dependencies

~240–700KB
~16K SLoC