2 releases

0.1.1 Jan 18, 2021
0.1.0 Feb 8, 2018

#1743 in Text processing

Download history 33/week @ 2024-03-11 23/week @ 2024-03-18 25/week @ 2024-03-25 60/week @ 2024-04-01 30/week @ 2024-04-08 29/week @ 2024-04-15 26/week @ 2024-04-22 17/week @ 2024-04-29 22/week @ 2024-05-06 18/week @ 2024-05-13 22/week @ 2024-05-20 21/week @ 2024-05-27 15/week @ 2024-06-03 27/week @ 2024-06-10 8/week @ 2024-06-17 18/week @ 2024-06-24

69 downloads per month
Used in lingo

MIT license

28KB
385 lines

stopwords-rs Crates.io Build Status

Stopwords from popular text processing frameworks.

These are high-frequency grammatical words which are usually ignored in information retrieval applications.


lib.rs:

This library provides stopwords datasets from popular text processing engines.

This could help reproducing results of text analysis pipelines written using different languages and tools.

Usage

[dependencies]
stopwords = "0.1.0"
extern crate stopwords;

use std::collections::HashSet;
use stopwords::{Spark, Language, Stopwords};

fn main() {
    let stops: HashSet<_> = Spark::stopwords(Language::English).unwrap().iter().collect();
    let mut tokens = vec!("brocolli", "is", "good", "to", "eat");
    tokens.retain(|s| !stops.contains(s));
    assert_eq!(tokens, vec!("brocolli", "good", "eat"));
}

Dependencies

~270–730KB
~17K SLoC