#token #processor #token-processor

token_processor

A fast, streaming‐oriented token processor for Large Language Model output in Rust. It's meant to be used with already decoded text tokens/chunks.

7 releases

new 0.1.6 Apr 22, 2025
0.1.5 Apr 22, 2025

#184 in Operating systems

Download history

117 downloads per month

MIT/Apache

21KB
321 lines

token_processor

crates.io docs.rs Build Tests Doc Tests

A fast, streaming‐oriented token processor for Large Language Model output in Rust.

It's meant to be used with already decoded text tokens/chunks.

Features

  • Streaming Handlers: Callbacks on tag open, data chunks, and close events in real time.
  • Buffered Handlers: Collect full payload between tags and invoke an async callback on close.
  • High Performance: Uses aho-corasick for efficient multi-pattern scanning, including cross‐chunk matches.

Installation

Add this to your Cargo.toml:

[dependencies]
token_processor = { path = "https://github.com/ljt019/token_processor"}

Or use cargo:

cargo add token_processor

Quickstart

use token_processor::{Tag, TokenProcessorBuilder};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut processor = TokenProcessorBuilder::new(1024)
        .streaming_tag(
            Tag::new("<think>"),
            || print!("[open] "),
            |chunk: &str| print!("{}", chunk),
            || print!(" [close]"),
        )
        .buffered_tag(Tag::new("<tool>"), |payload: String| async move {
            println!("[tool payload] {}", payload);
        })
        .raw_tokens(|chunk: &str| print!("{}", chunk))
        .build()?;

    processor.process("Hello <think>world</think> <tool>data</tool>!").await?;
    processor.flush().await?;
    Ok(())
}

Examples

Explore the examples/ folder for more usage scenarios:

  • simple.rs – raw tokens only
  • streaming_tags.rs – streaming‐mode tag handling
  • buffered_tags.rs – buffered‐mode tag handling

Testing

Run the full test suite:

cargo test

License

Licensed under MIT OR Apache‐2.0. See LICENSE for details.

Dependencies

~0.7–1.3MB
~24K SLoC