#artificial-intelligence #machine-learning #compile-time #knowledge

bin+lib airust

Trainable, modular AI engine in Rust with compile-time knowledge

7 releases

new 0.1.6 Apr 1, 2025
0.1.5 Mar 31, 2025

#217 in Machine learning

Download history 329/week @ 2025-03-24

329 downloads per month

MIT license

87KB
1.5K SLoC

airust

๐Ÿง  airust is a modular, trainable AI library written in Rust.
It supports compile-time knowledge through JSON files and provides sophisticated prediction engines for natural language input.


๐Ÿš€ AiRust Capabilities

โœ… What You Can Concretely Do:

๐Ÿง  1. Build Your Own AI Agents

  • Train agents with examples (Question โ†’ Answer)
  • Supported Agent Types:
    • Exact Match โ€“ precise matching
    • Fuzzy Match โ€“ tolerant to typos (Levenshtein)
    • TF-IDF/BM25 โ€“ semantic similarity
    • ContextAgent โ€“ remembers previous dialogues

๐Ÿ’ฌ 2. Manage Your Own Knowledge Database

  • Save/load training data (train.json)
  • Weighting and metadata per entry
  • Import legacy data possible

๐Ÿ“„ 3. PDF Knowledge Extraction

  • Convert PDF documents into structured knowledge bases
  • Intelligent text chunking with configurable parameters
  • Automatic metadata generation for search context
  • Merge multiple PDF sources into unified knowledge
  • Command-line tools for batch processing

๐Ÿงช 4. Text Analysis

  • Tokenization, stop words, N-grams
  • Similarity measures: Levenshtein, Jaccard
  • Text normalization

๐Ÿงฐ 5. Custom CLI Tools

  • Launch airust CLI for:
    • Interactive sessions with an agent
    • Knowledge base management
    • Quick data testing
    • PDF conversion and import

๐ŸŒ 6. Integration into Other Projects

  • Use airust as a Rust library in your own applications (Web, CLI, Desktop, IoT)

๐Ÿ”ง Example Application Ideas:

  • ๐Ÿค– FAQ Bot for your website
  • ๐Ÿ“š Intelligent document search
  • ๐Ÿงพ Customer support via terminal
  • ๐Ÿ—ฃ๏ธ Voice assistant with context understanding
  • ๐Ÿ”Ž Similarity search for text databases
  • ๐Ÿ›  Local assistance tool for developer documentation
  • ๐Ÿ“‘ Smart PDF document analyzer and query system

๐Ÿš€ Advanced Features

  • ๐Ÿงฉ Modular Architecture with Unified Traits:

    • Agent โ€“ Base trait for all agents with enhanced prediction capabilities
    • TrainableAgent โ€“ For agents that can be trained with examples
    • ContextualAgent โ€“ For context-aware conversational agents
    • ConfidenceAgent โ€“ New trait for agents that can provide prediction confidence
  • ๐Ÿง  Intelligent Agent Implementations:

    • MatchAgent โ€“ Advanced matching with configurable strategies
      • Exact matching
      • Fuzzy matching with dynamic thresholds
      • Configurable Levenshtein distance options
    • TfidfAgent โ€“ Sophisticated similarity detection using BM25 algorithm
      • Customizable term frequency scaling
      • Document length normalization
    • ContextAgent<A> โ€“ Flexible context-aware wrapper
      • Multiple context formatting strategies
      • Configurable context history size
  • ๐Ÿ“ Enhanced Response Handling:

    • ResponseFormat with support for:
      • Plain text
      • Markdown
      • JSON
    • Metadata and confidence tracking
    • Seamless type conversions
  • ๐Ÿ’พ Intelligent Knowledge Base:

    • Compile-time knowledge via train.json
    • Runtime knowledge expansion
    • Backward compatibility with legacy formats
    • Weighted training examples
    • Optional metadata support
  • ๐Ÿ“„ PDF Processing and Knowledge Extraction:

    • PdfLoader with configurable extraction parameters:
      • Min/max chunk sizes for optimal text segmentation
      • Chunk overlap for context preservation
      • Sentence-aware splitting for natural text boundaries
    • Intelligent PDF text extraction
    • Automatic training example generation from PDF content
    • PDF metadata preservation
    • Command-line tools for batch processing
    • Multi-document knowledge base merging
  • ๐Ÿ” Advanced Text Processing:

    • Tokenization with Unicode support
    • Stopword removal
    • Text normalization
    • N-gram generation
    • Advanced string similarity metrics
      • Levenshtein distance
      • Jaccard similarity
  • ๐Ÿ› ๏ธ Unified CLI Tool:

    • Interactive mode
    • Multiple agent type selection
    • Knowledge base management
    • Flexible querying
    • PDF import and conversion

๐Ÿ”ง Usage

Integration in other projects

[dependencies]
airust = "0.1.5"

Sample Code (Updated)

use airust::{Agent, TrainableAgent, MatchAgent, ResponseFormat, KnowledgeBase};

fn main() {
    // Load embedded knowledge base
    let kb = KnowledgeBase::from_embedded();

    // Create and train agent
    let mut agent = MatchAgent::new_exact();
    agent.train(kb.get_examples());

    // Ask a question
    let answer = agent.predict("What is airust?");

    // Print the response (converted from ResponseFormat to String)
    println!("Answer: {}", String::from(answer));
}

๐Ÿ“‚ Training Data Format

The file format knowledge/train.json has been extended to support both the old and new format:

[
  {
    "input": "What is airust?",
    "output": {
      "Text": "A modular AI library in Rust."
    },
    "weight": 2.0
  },
  {
    "input": "What agents are available?",
    "output": {
      "Markdown": "- **MatchAgent** (exact & fuzzy)\n- **TfidfAgent** (BM25)\n- **ContextAgent** (context-aware)"
    },
    "weight": 1.0
  }
]

Legacy format is still supported for backward compatibility.


๐Ÿ–ฅ๏ธ CLI Usage

# Simple query
airust query simple "What is airust?"
airust query fuzzy "What is airust?"
airust query tfidf "Explain airust"

# Interactive mode
airust interactive

# Knowledge base management
airust knowledge

๐Ÿ“„ PDF Conversion and Import

AIRust includes powerful tools for converting PDF documents into structured knowledge bases:

Using the PDF2KB Tool

# Convert a PDF file to a knowledge base with default settings
cargo run --bin pdf2kb path/to/document.pdf

# Specify custom output location
cargo run --bin pdf2kb path/to/document.pdf custom/output/path.json

# With custom chunk parameters
cargo run --bin pdf2kb path/to/document.pdf --min-chunk 100 --max-chunk 2000 --overlap 300

# Additional options
cargo run --bin pdf2kb path/to/document.pdf --weight 1.5 --no-metadata --no-sentence-split

Using AIRust's PDF Import Feature

# Import PDF directly through AIRust
cargo run --bin airust -- import-pdf path/to/document.pdf

Merging Multiple Knowledge Bases

After converting multiple PDFs to knowledge bases, merge them into a unified knowledge source:

# Merge all JSON files in the knowledge/ directory
cargo run --bin merge_kb

PDF Processing Configuration Options

  • --min-chunk <size>: Minimum chunk size in characters (default: 50)
  • --max-chunk <size>: Maximum chunk size in characters (default: 1000)
  • --overlap <size>: Overlap between chunks in characters (default: 200)
  • --weight <value>: Weight for generated training examples (default: 1.0)
  • --no-metadata: Disable inclusion of metadata in training examples
  • --no-sentence-split: Disable sentence boundary detection for chunking

๐Ÿ“Š Advanced Usage โ€“ Context Agent

use airust::{Agent, TrainableAgent, ContextualAgent, TfidfAgent, ContextAgent, KnowledgeBase};

fn main() {
    // Load embedded knowledge base
    let kb = KnowledgeBase::from_embedded();

    // Create and train base agent
    let mut base_agent = TfidfAgent::new()
        .with_bm25_params(1.5, 0.8);  // Custom BM25 tuning
    base_agent.train(kb.get_examples());

    // Wrap in a context-aware agent (remembering 3 turns)
    let mut agent = ContextAgent::new(base_agent, 3)
        .with_context_format(ContextFormat::List);

    // First question
    let answer1 = agent.predict("What is airust?");
    println!("A1: {}", String::from(answer1.clone()));

    // Add to context history
    agent.add_context("What is airust?".to_string(), answer1);

    // Follow-up question
    let answer2 = agent.predict("What features does it provide?");
    println!("A2: {}", String::from(answer2));
}

๐Ÿ“„ PDF Knowledge Extraction Example

use airust::{PdfLoader, PdfLoaderConfig, KnowledgeBase, TfidfAgent, Agent, TrainableAgent};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a custom PDF loader configuration
    let config = PdfLoaderConfig {
        min_chunk_size: 100,
        max_chunk_size: 1500,
        chunk_overlap: 250,
        default_weight: 1.2,
        include_metadata: true,
        split_by_sentence: true,
    };

    // Initialize the loader with custom configuration
    let loader = PdfLoader::with_config(config);

    // Convert PDF to a knowledge base
    let kb = loader.pdf_to_knowledge_base("documents/technical-paper.pdf")?;
    println!("Extracted {} training examples", kb.get_examples().len());

    // Create and train an agent with the extracted knowledge
    let mut agent = TfidfAgent::new();
    agent.train(kb.get_examples());

    // Ask questions about the PDF content
    let answer = agent.predict("What are the main findings in the paper?");
    println!("Answer: {}", String::from(answer));

    // Save the knowledge base for future use
    kb.save(Some("knowledge/technical-paper.json".into()))?;

    Ok(())
}

๐Ÿš€ New in Version 0.1.5

Matching Strategies

// Configurable fuzzy matching
let agent = MatchAgent::new(MatchingStrategy::Fuzzy(FuzzyOptions {
    max_distance: Some(5),      // Maximum Levenshtein distance
    threshold_factor: Some(0.2) // Dynamic length-based threshold
}));

Context Formatting

// Multiple context representation strategies
let context_agent = ContextAgent::new(base_agent, 3)
    .with_context_format(ContextFormat::List);
    // Other formats: QAPairs, Sentence, Custom

Advanced Text Utilities

// Text processing capabilities
let tokens = text_utils::tokenize("Hello, world!");
let unique_terms = text_utils::unique_terms(text);
let ngrams = text_utils::create_ngrams(text, 2);

PDF Processing

// Advanced PDF configuration
let config = PdfLoaderConfig {
    min_chunk_size: 100,
    max_chunk_size: 1500,
    chunk_overlap: 250,
    default_weight: 1.2,
    include_metadata: true,
    split_by_sentence: true,
};
let loader = PdfLoader::with_config(config);

// Convert PDF to knowledge base
let kb = loader.pdf_to_knowledge_base("path/to/document.pdf")?;

๐Ÿ“ƒ License

MIT

Built with โค๏ธ in Rust.
Contributions and extensions are welcome!


๐Ÿ›  Migration Guide for airust 0.1.5

This guide helps you migrate from airust 0.1.x to 0.1.5.

1. Trait and Type Changes

New Trait Hierarchy

trait Agent {
    fn predict(&self, input: &str) -> ResponseFormat;
}

trait TrainableAgent: Agent {
    fn train(&mut self, data: &[TrainingExample]);
}

trait ContextualAgent: Agent {
    fn add_context(&mut self, question: String, answer: ResponseFormat);
}

New Response Format

let answer: ResponseFormat = agent.predict("Question");
let answer_string: String = String::from(answer);

Updated TrainingExample Struct

struct TrainingExample {
    input: String,
    output: ResponseFormat,
    weight: f32,
}

2. Agent Replacements

SimpleAgent and FuzzyAgent โ†’ MatchAgent

let mut agent = MatchAgent::new_exact();
let mut agent = MatchAgent::new_fuzzy();

With options:

let mut agent = MatchAgent::new(MatchingStrategy::Fuzzy(FuzzyOptions {
    max_distance: Some(5),
    threshold_factor: Some(0.2),
}));

ContextAgent is Now Generic

let mut base_agent = TfidfAgent::new();
base_agent.train(&data);
let mut agent = ContextAgent::new(base_agent, 5);

StructuredAgent Removed (use ResponseFormat)


3. Knowledge Base Changes

let kb = KnowledgeBase::from_embedded();
let data = kb.get_examples();

let mut kb = KnowledgeBase::new();
kb.add_example("Question".to_string(), "Answer".to_string(), 1.0);

4. CLI Tool Migration

cargo run --bin airust -- query simple "What is airust?"
cargo run --bin airust -- interactive
cargo run --bin airust -- knowledge

5. New PDF Processing Tools

# Convert PDFs to knowledge bases
cargo run --bin pdf2kb document.pdf

# Import PDF directly in AIRust
cargo run --bin airust -- import-pdf document.pdf

# Merge PDF-derived knowledge bases
cargo run --bin merge_kb

6. Recommendations

  • Upgrade your dependencies
  • Use new lib.rs re-exports
  • Test thoroughly
  • Explore new context formatting
  • Try PDF knowledge extraction for document analysis

Dependencies

~22โ€“36MB
~421K SLoC