7 releases
new 0.1.6 | Apr 1, 2025 |
---|---|
0.1.5 | Mar 31, 2025 |
#217 in Machine learning
329 downloads per month
87KB
1.5K
SLoC
airust
๐ง airust is a modular, trainable AI library written in Rust.
It supports compile-time knowledge through JSON files and provides sophisticated prediction engines for natural language input.
๐ AiRust Capabilities
โ What You Can Concretely Do:
๐ง 1. Build Your Own AI Agents
- Train agents with examples (Question โ Answer)
- Supported Agent Types:
- Exact Match โ precise matching
- Fuzzy Match โ tolerant to typos (Levenshtein)
- TF-IDF/BM25 โ semantic similarity
- ContextAgent โ remembers previous dialogues
๐ฌ 2. Manage Your Own Knowledge Database
- Save/load training data (
train.json
) - Weighting and metadata per entry
- Import legacy data possible
๐ 3. PDF Knowledge Extraction
- Convert PDF documents into structured knowledge bases
- Intelligent text chunking with configurable parameters
- Automatic metadata generation for search context
- Merge multiple PDF sources into unified knowledge
- Command-line tools for batch processing
๐งช 4. Text Analysis
- Tokenization, stop words, N-grams
- Similarity measures: Levenshtein, Jaccard
- Text normalization
๐งฐ 5. Custom CLI Tools
- Launch
airust
CLI for:- Interactive sessions with an agent
- Knowledge base management
- Quick data testing
- PDF conversion and import
๐ 6. Integration into Other Projects
- Use
airust
as a Rust library in your own applications (Web, CLI, Desktop, IoT)
๐ง Example Application Ideas:
- ๐ค FAQ Bot for your website
- ๐ Intelligent document search
- ๐งพ Customer support via terminal
- ๐ฃ๏ธ Voice assistant with context understanding
- ๐ Similarity search for text databases
- ๐ Local assistance tool for developer documentation
- ๐ Smart PDF document analyzer and query system
๐ Advanced Features
-
๐งฉ Modular Architecture with Unified Traits:
Agent
โ Base trait for all agents with enhanced prediction capabilitiesTrainableAgent
โ For agents that can be trained with examplesContextualAgent
โ For context-aware conversational agentsConfidenceAgent
โ New trait for agents that can provide prediction confidence
-
๐ง Intelligent Agent Implementations:
MatchAgent
โ Advanced matching with configurable strategies- Exact matching
- Fuzzy matching with dynamic thresholds
- Configurable Levenshtein distance options
TfidfAgent
โ Sophisticated similarity detection using BM25 algorithm- Customizable term frequency scaling
- Document length normalization
ContextAgent<A>
โ Flexible context-aware wrapper- Multiple context formatting strategies
- Configurable context history size
-
๐ Enhanced Response Handling:
ResponseFormat
with support for:- Plain text
- Markdown
- JSON
- Metadata and confidence tracking
- Seamless type conversions
-
๐พ Intelligent Knowledge Base:
- Compile-time knowledge via
train.json
- Runtime knowledge expansion
- Backward compatibility with legacy formats
- Weighted training examples
- Optional metadata support
- Compile-time knowledge via
-
๐ PDF Processing and Knowledge Extraction:
PdfLoader
with configurable extraction parameters:- Min/max chunk sizes for optimal text segmentation
- Chunk overlap for context preservation
- Sentence-aware splitting for natural text boundaries
- Intelligent PDF text extraction
- Automatic training example generation from PDF content
- PDF metadata preservation
- Command-line tools for batch processing
- Multi-document knowledge base merging
-
๐ Advanced Text Processing:
- Tokenization with Unicode support
- Stopword removal
- Text normalization
- N-gram generation
- Advanced string similarity metrics
- Levenshtein distance
- Jaccard similarity
-
๐ ๏ธ Unified CLI Tool:
- Interactive mode
- Multiple agent type selection
- Knowledge base management
- Flexible querying
- PDF import and conversion
๐ง Usage
Integration in other projects
[dependencies]
airust = "0.1.5"
Sample Code (Updated)
use airust::{Agent, TrainableAgent, MatchAgent, ResponseFormat, KnowledgeBase};
fn main() {
// Load embedded knowledge base
let kb = KnowledgeBase::from_embedded();
// Create and train agent
let mut agent = MatchAgent::new_exact();
agent.train(kb.get_examples());
// Ask a question
let answer = agent.predict("What is airust?");
// Print the response (converted from ResponseFormat to String)
println!("Answer: {}", String::from(answer));
}
๐ Training Data Format
The file format knowledge/train.json
has been extended to support both the old and new format:
[
{
"input": "What is airust?",
"output": {
"Text": "A modular AI library in Rust."
},
"weight": 2.0
},
{
"input": "What agents are available?",
"output": {
"Markdown": "- **MatchAgent** (exact & fuzzy)\n- **TfidfAgent** (BM25)\n- **ContextAgent** (context-aware)"
},
"weight": 1.0
}
]
Legacy format is still supported for backward compatibility.
๐ฅ๏ธ CLI Usage
# Simple query
airust query simple "What is airust?"
airust query fuzzy "What is airust?"
airust query tfidf "Explain airust"
# Interactive mode
airust interactive
# Knowledge base management
airust knowledge
๐ PDF Conversion and Import
AIRust includes powerful tools for converting PDF documents into structured knowledge bases:
Using the PDF2KB Tool
# Convert a PDF file to a knowledge base with default settings
cargo run --bin pdf2kb path/to/document.pdf
# Specify custom output location
cargo run --bin pdf2kb path/to/document.pdf custom/output/path.json
# With custom chunk parameters
cargo run --bin pdf2kb path/to/document.pdf --min-chunk 100 --max-chunk 2000 --overlap 300
# Additional options
cargo run --bin pdf2kb path/to/document.pdf --weight 1.5 --no-metadata --no-sentence-split
Using AIRust's PDF Import Feature
# Import PDF directly through AIRust
cargo run --bin airust -- import-pdf path/to/document.pdf
Merging Multiple Knowledge Bases
After converting multiple PDFs to knowledge bases, merge them into a unified knowledge source:
# Merge all JSON files in the knowledge/ directory
cargo run --bin merge_kb
PDF Processing Configuration Options
--min-chunk <size>
: Minimum chunk size in characters (default: 50)--max-chunk <size>
: Maximum chunk size in characters (default: 1000)--overlap <size>
: Overlap between chunks in characters (default: 200)--weight <value>
: Weight for generated training examples (default: 1.0)--no-metadata
: Disable inclusion of metadata in training examples--no-sentence-split
: Disable sentence boundary detection for chunking
๐ Advanced Usage โ Context Agent
use airust::{Agent, TrainableAgent, ContextualAgent, TfidfAgent, ContextAgent, KnowledgeBase};
fn main() {
// Load embedded knowledge base
let kb = KnowledgeBase::from_embedded();
// Create and train base agent
let mut base_agent = TfidfAgent::new()
.with_bm25_params(1.5, 0.8); // Custom BM25 tuning
base_agent.train(kb.get_examples());
// Wrap in a context-aware agent (remembering 3 turns)
let mut agent = ContextAgent::new(base_agent, 3)
.with_context_format(ContextFormat::List);
// First question
let answer1 = agent.predict("What is airust?");
println!("A1: {}", String::from(answer1.clone()));
// Add to context history
agent.add_context("What is airust?".to_string(), answer1);
// Follow-up question
let answer2 = agent.predict("What features does it provide?");
println!("A2: {}", String::from(answer2));
}
๐ PDF Knowledge Extraction Example
use airust::{PdfLoader, PdfLoaderConfig, KnowledgeBase, TfidfAgent, Agent, TrainableAgent};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create a custom PDF loader configuration
let config = PdfLoaderConfig {
min_chunk_size: 100,
max_chunk_size: 1500,
chunk_overlap: 250,
default_weight: 1.2,
include_metadata: true,
split_by_sentence: true,
};
// Initialize the loader with custom configuration
let loader = PdfLoader::with_config(config);
// Convert PDF to a knowledge base
let kb = loader.pdf_to_knowledge_base("documents/technical-paper.pdf")?;
println!("Extracted {} training examples", kb.get_examples().len());
// Create and train an agent with the extracted knowledge
let mut agent = TfidfAgent::new();
agent.train(kb.get_examples());
// Ask questions about the PDF content
let answer = agent.predict("What are the main findings in the paper?");
println!("Answer: {}", String::from(answer));
// Save the knowledge base for future use
kb.save(Some("knowledge/technical-paper.json".into()))?;
Ok(())
}
๐ New in Version 0.1.5
Matching Strategies
// Configurable fuzzy matching
let agent = MatchAgent::new(MatchingStrategy::Fuzzy(FuzzyOptions {
max_distance: Some(5), // Maximum Levenshtein distance
threshold_factor: Some(0.2) // Dynamic length-based threshold
}));
Context Formatting
// Multiple context representation strategies
let context_agent = ContextAgent::new(base_agent, 3)
.with_context_format(ContextFormat::List);
// Other formats: QAPairs, Sentence, Custom
Advanced Text Utilities
// Text processing capabilities
let tokens = text_utils::tokenize("Hello, world!");
let unique_terms = text_utils::unique_terms(text);
let ngrams = text_utils::create_ngrams(text, 2);
PDF Processing
// Advanced PDF configuration
let config = PdfLoaderConfig {
min_chunk_size: 100,
max_chunk_size: 1500,
chunk_overlap: 250,
default_weight: 1.2,
include_metadata: true,
split_by_sentence: true,
};
let loader = PdfLoader::with_config(config);
// Convert PDF to knowledge base
let kb = loader.pdf_to_knowledge_base("path/to/document.pdf")?;
๐ License
MIT
Built with โค๏ธ in Rust.
Contributions and extensions are welcome!
๐ Migration Guide for airust 0.1.5
This guide helps you migrate from airust 0.1.x to 0.1.5.
1. Trait and Type Changes
New Trait Hierarchy
trait Agent {
fn predict(&self, input: &str) -> ResponseFormat;
}
trait TrainableAgent: Agent {
fn train(&mut self, data: &[TrainingExample]);
}
trait ContextualAgent: Agent {
fn add_context(&mut self, question: String, answer: ResponseFormat);
}
New Response Format
let answer: ResponseFormat = agent.predict("Question");
let answer_string: String = String::from(answer);
Updated TrainingExample Struct
struct TrainingExample {
input: String,
output: ResponseFormat,
weight: f32,
}
2. Agent Replacements
SimpleAgent and FuzzyAgent โ MatchAgent
let mut agent = MatchAgent::new_exact();
let mut agent = MatchAgent::new_fuzzy();
With options:
let mut agent = MatchAgent::new(MatchingStrategy::Fuzzy(FuzzyOptions {
max_distance: Some(5),
threshold_factor: Some(0.2),
}));
ContextAgent is Now Generic
let mut base_agent = TfidfAgent::new();
base_agent.train(&data);
let mut agent = ContextAgent::new(base_agent, 5);
StructuredAgent Removed (use ResponseFormat)
3. Knowledge Base Changes
let kb = KnowledgeBase::from_embedded();
let data = kb.get_examples();
let mut kb = KnowledgeBase::new();
kb.add_example("Question".to_string(), "Answer".to_string(), 1.0);
4. CLI Tool Migration
cargo run --bin airust -- query simple "What is airust?"
cargo run --bin airust -- interactive
cargo run --bin airust -- knowledge
5. New PDF Processing Tools
# Convert PDFs to knowledge bases
cargo run --bin pdf2kb document.pdf
# Import PDF directly in AIRust
cargo run --bin airust -- import-pdf document.pdf
# Merge PDF-derived knowledge bases
cargo run --bin merge_kb
6. Recommendations
- Upgrade your dependencies
- Use new
lib.rs
re-exports - Test thoroughly
- Explore new context formatting
- Try PDF knowledge extraction for document analysis
Dependencies
~22โ36MB
~421K SLoC