9 releases
| new 0.1.8 | Jan 11, 2026 |
|---|---|
| 0.1.7 | Jan 7, 2026 |
| 0.1.3 | Dec 16, 2025 |
| 0.1.2 | Nov 30, 2025 |
#174 in Text processing
3,795 downloads per month
Used in 11 crates
(5 directly)
315KB
7K
SLoC
SIMD-accelerated RAG pipeline built on Trueno compute primitives. Part of the Sovereign AI Stack.
Features
- Pure Rust - Zero Python/C++ dependencies
- Chunking - Recursive, Fixed, Sentence, Paragraph, Semantic, Structural
- Hybrid Retrieval - Dense (vector) + Sparse (BM25) search
- Fusion - RRF, Linear, DBSF, Convex, Union, Intersection
- Reranking - Lexical, cross-encoder, and composite rerankers
- Metrics - Recall, Precision, MRR, NDCG, MAP
- Semantic Embeddings - Production ONNX models via FastEmbed (optional)
- Nemotron Embeddings - NVIDIA Embed Nemotron 8B via GGUF (optional)
- Index Compression - LZ4/ZSTD compressed persistence (optional)
Installation
[dependencies]
trueno-rag = "0.1.8"
Quick Start
use trueno_rag::{
pipeline::RagPipelineBuilder,
chunk::RecursiveChunker,
embed::MockEmbedder,
rerank::LexicalReranker,
fusion::FusionStrategy,
Document,
};
let mut pipeline = RagPipelineBuilder::new()
.chunker(RecursiveChunker::new(512, 50))
.embedder(MockEmbedder::new(384))
.reranker(LexicalReranker::new())
.fusion(FusionStrategy::RRF { k: 60.0 })
.build()?;
let doc = Document::new("Your content here...").with_title("Doc Title");
pipeline.index_document(&doc)?;
let (results, context) = pipeline.query_with_context("your query", 5)?;
Examples
# Basic examples
cargo run --example basic_rag
cargo run --example chunking_strategies
cargo run --example hybrid_search
cargo run --example metrics_evaluation
# With semantic embeddings (downloads ~90MB ONNX model on first run)
cargo run --example semantic_embeddings --features embeddings
# With compressed index persistence
cargo run --example compressed_index --features compression
# With NVIDIA Nemotron embeddings (requires GGUF model file)
NEMOTRON_MODEL_PATH=/path/to/model.gguf cargo run --example nemotron_embeddings --features nemotron
Optional Features
Semantic Embeddings (FastEmbed)
Production-quality vector embeddings via FastEmbed (ONNX Runtime):
trueno-rag = { version = "0.1.8", features = ["embeddings"] }
use trueno_rag::embed::{FastEmbedder, EmbeddingModelType, Embedder};
let embedder = FastEmbedder::new(EmbeddingModelType::AllMiniLmL6V2)?;
let embedding = embedder.embed("Hello, world!")?;
// 384-dimensional embeddings
Available models:
AllMiniLmL6V2- Fast, 384 dims (default)AllMiniLmL12V2- Better quality, 384 dimsBgeSmallEnV15- Balanced, 384 dimsBgeBaseEnV15- Higher quality, 768 dimsNomicEmbedTextV1- Retrieval optimized, 768 dims
NVIDIA Embed Nemotron 8B
High-quality 4096-dimensional embeddings via GGUF model inference:
trueno-rag = { version = "0.1.8", features = ["nemotron"] }
use trueno_rag::embed::{NemotronEmbedder, NemotronConfig, Embedder};
let config = NemotronConfig::new("models/NV-Embed-v2-Q4_K.gguf")
.with_gpu(true)
.with_normalize(true);
let embedder = NemotronEmbedder::new(config)?;
// Asymmetric retrieval - different prefixes for queries vs documents
let query_emb = embedder.embed_query("What is machine learning?")?;
let doc_emb = embedder.embed_document("Machine learning is a branch of AI...")?;
Index Compression
LZ4/ZSTD compressed index persistence:
trueno-rag = { version = "0.1.8", features = ["compression"] }
use trueno_rag::{compressed::Compression, BM25Index};
let bytes = index.to_compressed_bytes(Compression::Zstd)?;
// 4-6x compression ratio
Stack Dependencies
trueno-rag is part of the Sovereign AI Stack:
| Crate | Version | Purpose |
|---|---|---|
| trueno | 0.11 | SIMD/GPU compute primitives |
| trueno-db | 0.3.10 | GPU-first analytics database |
| realizar | 0.5.1 | GGUF/APR model inference |
| fastembed | 5.x | ONNX embeddings |
Development
make test # Run tests
make lint # Clippy lints
make coverage # Coverage report (95%+ target)
make book # Build documentation book
Documentation
License
MIT
Dependencies
~34–48MB
~810K SLoC