#chunks #sentence #ever #chonkier #recursive-chunker #recursive-rules #character-tokenizer #crab-sparkles

chonkier

🦛 Chonkie, now in Rust 🦀: No-nonsense, ultra-fast, ultra-light chunking library

2 releases

Uses new Rust 2024

new 0.0.2 Apr 24, 2025
0.0.1 Apr 14, 2025

#305 in Text processing

Download history 75/week @ 2025-04-09 35/week @ 2025-04-16

110 downloads per month

Apache-2.0

245KB
1K SLoC

🦛 ChonkieR 🦀✨

Crates.io version License Package size Discord GitHub stars

The no-nonsense, lightweight and fast chunking library that's ready to CHONK your text, in Rust 🦀!

InstallationUsageChunkersAcknowledgementsCitation

Chonkie just got low-leveled! 🦀 Your favorite python chunking library is now in Rust~ even faster, smaller and reliable than ever!

🦀 Rusty & Reliable: Built with Rust for memory safety and performance.
🚀 Feature-rich: All the CHONKs you'd ever need
✨ Easy to use: Add Crate, Use Crate, CHONK
⚡ Blazingly Fast: CHONK at the speed of Rust! zooooom
🪶 Light-weight: No bloat, just CHONK
🦛 Cute CHONK mascot: psst it's a pygmy hippo btw
❤️ Moto Moto's favorite Rust library

ChonkieR is a chunking library that "just works" ✨

Installation

To add ChonkieR to your project, run:

cargo add chonkier # Or add it to your Cargo.toml

ChonkieR follows the rule of minimum dependencies. Features can be enabled via Cargo features. Don't want to think about it? Simply enable all features (Not recommended for production binaries unless needed)

# Cargo.toml
[dependencies]
chonkier = { version = "0.1.0", features = ["all"] } # Replace with desired version

Usage

Here's a basic example to get you started:

use chonkier::CharacterTokenizer;
use chonkier::RecursiveChunker;
use chonkier::types::RecursiveRules;

fn main() { 
    // Initialize the chunker
    let chunker = RecursiveChunker::new(CharacterTokenizer::new(), 512, RecursiveRules::default());

    // Chunk some text
    let text = "ChonkieR is the goodest boi! My favorite chunking hippo hehe.";
    let chunks: Vec<RecursiveChunk> = chunker.chunk(text); 

    // Access chunks
    for chunk in chunks {
        println!("Chunk: {}", chunk.text); 
        println!("Tokens: {}", chunk.token_count); 
    }
}

Check out more usage examples in the examples folder!

Chunkers

ChonkieR currently supports the following chunkers:

  • TokenChunker: Split text into fixed-size token chunks.
  • SentenceChunker: Split text into chunks based on sentence boundaries.
  • RecursiveChunker: Recursively split the text into chunks based on the rules provided.

Acknowledgements

ChonkieR would like to CHONK its way through a special thanks to all the users and contributors who have helped make this library what it is today! Your feedback, issue reports, and improvements have helped make ChonkieR the CHONKIEST it can be.

And of course, special thanks to Moto Moto for endorsing ChonkieR with his famous quote:

"I like them big, I like them chonkieR." ~ Moto Moto (He really said this)

Citation

If you use ChonkieR in your research, please cite it as follows:

@software{chonkie2025,
  author = {Minhas, Bhavnick AND Nigam, Shreyash},
  title = {Chonkie: A no-nonsense fast, lightweight, and efficient text chunking library},
  year = {2025},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/chonkie-inc/chonkie}},
}

Dependencies

~2–15MB
~158K SLoC