7 releases
new 0.1.6 | Apr 22, 2025 |
---|---|
0.1.5 | Apr 22, 2025 |
#184 in Operating systems
117 downloads per month
21KB
321 lines
token_processor
A fast, streaming‐oriented token processor for Large Language Model output in Rust.
It's meant to be used with already decoded text tokens/chunks.
Features
- Streaming Handlers: Callbacks on tag open, data chunks, and close events in real time.
- Buffered Handlers: Collect full payload between tags and invoke an async callback on close.
- High Performance: Uses
aho-corasick
for efficient multi-pattern scanning, including cross‐chunk matches.
Installation
Add this to your Cargo.toml
:
[dependencies]
token_processor = { path = "https://github.com/ljt019/token_processor"}
Or use cargo:
cargo add token_processor
Quickstart
use token_processor::{Tag, TokenProcessorBuilder};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut processor = TokenProcessorBuilder::new(1024)
.streaming_tag(
Tag::new("<think>"),
|| print!("[open] "),
|chunk: &str| print!("{}", chunk),
|| print!(" [close]"),
)
.buffered_tag(Tag::new("<tool>"), |payload: String| async move {
println!("[tool payload] {}", payload);
})
.raw_tokens(|chunk: &str| print!("{}", chunk))
.build()?;
processor.process("Hello <think>world</think> <tool>data</tool>!").await?;
processor.flush().await?;
Ok(())
}
Examples
Explore the examples/
folder for more usage scenarios:
simple.rs
– raw tokens onlystreaming_tags.rs
– streaming‐mode tag handlingbuffered_tags.rs
– buffered‐mode tag handling
Testing
Run the full test suite:
cargo test
License
Licensed under MIT OR Apache‐2.0. See LICENSE for details.
Dependencies
~0.7–1.3MB
~24K SLoC