#tokenization #processing #seamless #pattern #text-processing #token #position

crossandra

A straightforward tokenization library for seamless text processing

2 releases

new 0.0.2 Dec 19, 2024
0.0.1 Dec 12, 2024

#4 in #tokenization

Download history 125/week @ 2024-12-09

125 downloads per month

MIT license

605KB
2K SLoC

crossandra-rs

crossandra-rs is a work-in-progress ⚠️, straightforward tokenization library for seamless text processing. A simplified Rust implementation of the Python Crossandra library.

Usage

Add this to your Cargo.toml:

[dependencies]
crossandra = "0.0.1"

Import and use like this:

use crossandra::{Tokenizer, common};

fn main() {
    let word_finder = Tokenizer::default()
        .with_patterns(vec![common::WORD.clone()])
        .expect("built-in pattern should be safe");

    let text = "Hello, world!";

    for token in word_finder.tokenize(text).flatten() {
        println!("{:?}", token);
    }
    // Token { name: "word", value: "Hello", position: 0}
    // Token { name: "word", value: "world", position: 7}
}

Documentation

The documentation is available at docs.rs/crossandra.

Acknowledgements

Huge thanks to @Maneren for his invaluable guidance in developing this library 🫶

License

crossandra-rs is licensed under the MIT License.
© trag1c, 2024

Dependencies

~4.5MB
~85K SLoC