#nlp #tokenizer #machine-learning #blingfire


Wrapper for the BlingFire tokenization library

3 unstable releases

✓ Uses Rust 2018 edition

0.2.1 Jul 10, 2019
0.2.0 Jul 7, 2019
0.1.0 Jul 6, 2019

#20 in Machine learning

13 downloads per month

MIT license

132K SLoC

C++ 124K SLoC // 0.0% comments Batch 7K SLoC // 0.0% comments Happy 232 SLoC Python 230 SLoC // 0.1% comments Rust 224 SLoC // 0.0% comments

Build Status Documentation Crate

BlingFire in Rust

blingfire is a thin Rust wrapper for the BlingFire tokenization library.

Add the library to Cargo.toml to get started

cargo add blingfire

The library exposes two functions text_to_words and text_to_sentences

use blingfire;

fn main() {
    let mut parsed = String::with_capacity(128);

    blingfire::text_to_words("Cat,sat on   the mat.", &mut parsed).unwrap();
    assert_eq!(parsed.as_str(), "Cat , sat on the mat .");

    blingfire::text_to_sentences("Cat sat. Dog barked.", &mut parsed).unwrap();
    assert_eq!(parsed.as_str(), "Cat sat.\nDog barked.");

The code is licensed under the MIT License.