2 releases

0.1.1 Jan 18, 2021
0.1.0 Feb 8, 2018

#1718 in Text processing

Download history 3/week @ 2024-01-01 6/week @ 2024-01-08 2/week @ 2024-01-15 1/week @ 2024-01-22 1/week @ 2024-02-05 16/week @ 2024-02-12 18/week @ 2024-02-19 43/week @ 2024-02-26 27/week @ 2024-03-04 33/week @ 2024-03-11 23/week @ 2024-03-18 25/week @ 2024-03-25 60/week @ 2024-04-01 30/week @ 2024-04-08 29/week @ 2024-04-15

145 downloads per month
Used in lingo

MIT license

28KB
385 lines

stopwords-rs Crates.io Build Status

Stopwords from popular text processing frameworks.

These are high-frequency grammatical words which are usually ignored in information retrieval applications.


lib.rs:

This library provides stopwords datasets from popular text processing engines.

This could help reproducing results of text analysis pipelines written using different languages and tools.

Usage

[dependencies]
stopwords = "0.1.0"
extern crate stopwords;

use std::collections::HashSet;
use stopwords::{Spark, Language, Stopwords};

fn main() {
    let stops: HashSet<_> = Spark::stopwords(Language::English).unwrap().iter().collect();
    let mut tokens = vec!("brocolli", "is", "good", "to", "eat");
    tokens.retain(|s| !stops.contains(s));
    assert_eq!(tokens, vec!("brocolli", "good", "eat"));
}

Dependencies

~0.4–0.8MB
~19K SLoC