2 releases

0.1.1 Jan 18, 2021
0.1.0 Feb 8, 2018

#1693 in Text processing

Download history 166/week @ 2024-11-16 158/week @ 2024-11-23 153/week @ 2024-11-30 113/week @ 2024-12-07 79/week @ 2024-12-14 33/week @ 2024-12-21 91/week @ 2024-12-28 89/week @ 2025-01-04 143/week @ 2025-01-11 144/week @ 2025-01-18 136/week @ 2025-01-25 148/week @ 2025-02-01 185/week @ 2025-02-08 161/week @ 2025-02-15 123/week @ 2025-02-22 89/week @ 2025-03-01

592 downloads per month
Used in lingo

MIT license

28KB
385 lines

stopwords-rs Crates.io Build Status

Stopwords from popular text processing frameworks.

These are high-frequency grammatical words which are usually ignored in information retrieval applications.


lib.rs:

This library provides stopwords datasets from popular text processing engines.

This could help reproducing results of text analysis pipelines written using different languages and tools.

Usage

[dependencies]
stopwords = "0.1.0"
extern crate stopwords;

use std::collections::HashSet;
use stopwords::{Spark, Language, Stopwords};

fn main() {
    let stops: HashSet<_> = Spark::stopwords(Language::English).unwrap().iter().collect();
    let mut tokens = vec!("brocolli", "is", "good", "to", "eat");
    tokens.retain(|s| !stops.contains(s));
    assert_eq!(tokens, vec!("brocolli", "good", "eat"));
}

Dependencies

~220–670KB
~15K SLoC