#tokenizer #nlp #machine-learning

blingfire

Wrapper for the BlingFire tokenization library

5 releases (1 stable)

1.0.0 Jun 23, 2020
0.3.0 Jul 28, 2019
0.2.1 Jul 10, 2019
0.2.0 Jul 7, 2019
0.1.0 Jul 6, 2019

#622 in Machine learning

Download history 272/week @ 2024-03-13 225/week @ 2024-03-20 279/week @ 2024-03-27 319/week @ 2024-04-03 296/week @ 2024-04-10 305/week @ 2024-04-17 325/week @ 2024-04-24 287/week @ 2024-05-01 402/week @ 2024-05-08 490/week @ 2024-05-15 423/week @ 2024-05-22 695/week @ 2024-05-29 822/week @ 2024-06-05 990/week @ 2024-06-12 780/week @ 2024-06-19 768/week @ 2024-06-26

3,549 downloads per month

MIT license

5MB
66K SLoC

C++ 66K SLoC // 0.0% comments Rust 229 SLoC // 0.0% comments Jupyter Notebooks 120 SLoC // 0.2% comments

Build Status Documentation

BlingFire in Rust

blingfire is a thin Rust wrapper for the BlingFire tokenization library.

Add the library to Cargo.toml to get started

cargo add blingfire

The library exposes two functions text_to_words and text_to_sentences

use blingfire;

fn main() {
    let mut parsed = String::new();

    blingfire::text_to_words("Cat,sat on   the mat.", &mut parsed).unwrap();
    assert_eq!(parsed.as_str(), "Cat , sat on the mat .");

    blingfire::text_to_sentences("Cat sat. Dog barked.", &mut parsed).unwrap();
    assert_eq!(parsed.as_str(), "Cat sat.\nDog barked.");
}

The code is licensed under the MIT License.


lib.rs:

blingfire is a thin Rust wrapper for the BlingFire tokenization library.

Dependencies