#tokenization #processing #seamless #pattern #text-processing #token #position

crossandra

A straightforward tokenization library for seamless text processing

2 releases

0.0.2 Dec 19, 2024
0.0.1 Dec 12, 2024

#4 in #seamless

Download history 133/week @ 2024-12-11 122/week @ 2024-12-18 11/week @ 2024-12-25 19/week @ 2025-01-01 22/week @ 2025-01-08

177 downloads per month

MIT license

605KB
2K SLoC

crossandra-rs

crossandra-rs is a work-in-progress ⚠️, straightforward tokenization library for seamless text processing. A simplified Rust implementation of the Python Crossandra library.

Usage

Add this to your Cargo.toml:

[dependencies]
crossandra = "0.0.1"

Import and use like this:

use crossandra::{Tokenizer, common};

fn main() {
    let word_finder = Tokenizer::default()
        .with_patterns(vec![common::WORD.clone()])
        .expect("built-in pattern should be safe");

    let text = "Hello, world!";

    for token in word_finder.tokenize(text).flatten() {
        println!("{:?}", token);
    }
    // Token { name: "word", value: "Hello", position: 0}
    // Token { name: "word", value: "world", position: 7}
}

Documentation

The documentation is available at docs.rs/crossandra.

Acknowledgements

Huge thanks to @Maneren for his invaluable guidance in developing this library 🫶

License

crossandra-rs is licensed under the MIT License.
© trag1c, 2024

Dependencies

~4.5MB
~86K SLoC