4 releases (breaking)

Uses new Rust 2024

0.4.0 Feb 8, 2026
0.3.0 Feb 8, 2026
0.2.0 Feb 8, 2026
0.1.0 Feb 7, 2026

#182 in Asynchronous


Used in 4 crates (3 directly)

MIT/Apache

250KB
5.5K SLoC

saorsa-ai

A unified, multi-provider LLM API for Rust with streaming, tool calling, and model metadata.

Crates.io Documentation License MSRV

Overview

saorsa-ai provides a single, vendor-agnostic API for interacting with large language models from multiple providers:

  • Anthropic (Claude) - Messages API with native tool use
  • OpenAI - Chat Completions API with function calling
  • Google Gemini - GenerateContent API with function calling
  • Ollama - Local inference via NDJSON chat API
  • OpenAI-compatible - Azure OpenAI, Groq, Mistral, OpenRouter, xAI, Cerebras, and any other OpenAI-compatible endpoint

All providers share the same request/response types, streaming events, and tool calling interface. Switch providers by changing a config value - no code changes required.

Quick Start

Add saorsa-ai to your Cargo.toml:

[dependencies]
saorsa-ai = "0.1"
tokio = { version = "1", features = ["full"] }

Non-Streaming Completion

use saorsa_ai::{
    CompletionRequest, Message, Provider, ProviderConfig, ProviderKind, ProviderRegistry,
};

#[tokio::main]
async fn main() -> saorsa_ai::Result<()> {
    let config = ProviderConfig::new(
        ProviderKind::Anthropic,
        std::env::var("ANTHROPIC_API_KEY").expect("set ANTHROPIC_API_KEY"),
        "claude-sonnet-4",
    );

    let registry = ProviderRegistry::default();
    let provider = registry.create(config)?;

    let request = CompletionRequest::new(
        "claude-sonnet-4",
        vec![Message::user("What is the capital of France?")],
        1024,
    );

    let response = provider.complete(request).await?;
    for block in &response.content {
        if let saorsa_ai::ContentBlock::Text { text } = block {
            println!("{text}");
        }
    }

    Ok(())
}

Streaming Completion

use saorsa_ai::{
    CompletionRequest, ContentDelta, Message, ProviderConfig, ProviderKind,
    ProviderRegistry, StreamEvent, StreamingProvider,
};

#[tokio::main]
async fn main() -> saorsa_ai::Result<()> {
    let config = ProviderConfig::new(
        ProviderKind::OpenAi,
        std::env::var("OPENAI_API_KEY").expect("set OPENAI_API_KEY"),
        "gpt-4o",
    );

    let registry = ProviderRegistry::default();
    let provider = registry.create(config)?;

    let request = CompletionRequest::new(
        "gpt-4o",
        vec![Message::user("Explain async/await in Rust")],
        2048,
    ).system("You are a helpful programming tutor.");

    let mut rx = provider.stream(request).await?;

    while let Some(event) = rx.recv().await {
        match event? {
            StreamEvent::ContentBlockDelta {
                delta: ContentDelta::TextDelta { text }, ..
            } => print!("{text}"),
            StreamEvent::MessageDelta { stop_reason, .. } => {
                if stop_reason.is_some() {
                    println!();
                }
            }
            _ => {}
        }
    }

    Ok(())
}

In-Process Local Inference (mistralrs / GGUF)

If you want to run fully in-process (single binary) without an external HTTP server, saorsa-ai provides an optional mistralrs-backed provider behind a feature flag.

Add the feature and the mistralrs dependency:

[dependencies]
saorsa-ai = { version = "0.1", features = ["mistralrs"] }
mistralrs = "0.7"
tokio = { version = "1", features = ["full"] }

Default download/cache location for model files (Hugging Face hub cache):

  • HF_HOME/hub if HF_HOME is set
  • otherwise ~/.cache/huggingface/hub
use std::sync::Arc;

use saorsa_ai::{CompletionRequest, Message, MistralrsConfig, MistralrsProvider, StreamingProvider};

#[tokio::main]
async fn main() -> saorsa_ai::Result<()> {
    // Load a GGUF model (downloads/caches under the HF hub cache).
    // Provide the HF repo id + GGUF filename(s).
    let model = mistralrs::GgufModelBuilder::new(
        "TheBloke/CodeLlama-7B-Instruct-GGUF".to_string(),
        vec!["codellama-7b-instruct.Q4_K_M.gguf".to_string()],
    )
    .with_force_cpu()
    .build()
    .await
    .map_err(|e| saorsa_ai::SaorsaAiError::Provider {
        provider: "mistralrs".into(),
        message: e.to_string(),
    })?;

    let provider = MistralrsProvider::new(Arc::new(model), MistralrsConfig::default());

    // MVP: text-only (no tools).
    let request = CompletionRequest::new(
        "local",
        vec![Message::user("Write a short Rust function that adds two i32 values.")],
        256,
    )
    .system("You are a helpful programming assistant.");

    let mut rx = provider.stream(request).await?;
    while let Some(ev) = rx.recv().await {
        if let saorsa_ai::StreamEvent::ContentBlockDelta {
            delta: saorsa_ai::ContentDelta::TextDelta { text },
            ..
        } = ev?
        {
            print!("{text}");
        }
    }
    Ok(())
}

Provider Catalog

Anthropic (Claude)

Detail Value
Endpoint https://api.anthropic.com/v1/messages
Auth x-api-key header
Streaming Server-Sent Events (SSE)
API version 2023-06-01

Models:

Model Context Tools Vision
claude-opus-4 200k Yes Yes
claude-sonnet-4 200k Yes Yes
claude-haiku-4 200k Yes Yes
claude-3-5-sonnet 200k Yes Yes
claude-3-5-haiku 200k Yes Yes
claude-3-opus 200k Yes Yes
let config = ProviderConfig::new(
    ProviderKind::Anthropic,
    "sk-ant-...",
    "claude-sonnet-4",
);

OpenAI

Detail Value
Endpoint https://api.openai.com/v1/chat/completions
Auth Authorization: Bearer
Streaming Server-Sent Events (SSE)

Models:

Model Context Tools Vision
gpt-4o 128k Yes Yes
gpt-4o-mini 128k Yes Yes
gpt-4-turbo 128k Yes Yes
o1 200k Yes Yes
o3-mini 200k Yes No
let config = ProviderConfig::new(
    ProviderKind::OpenAi,
    "sk-...",
    "gpt-4o",
);

Google Gemini

Detail Value
Endpoint https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent
Auth x-goog-api-key header
Streaming SSE via streamGenerateContent?alt=sse

Models:

Model Context Tools Vision
gemini-2.0-flash 1M Yes Yes
gemini-1.5-pro 2M Yes Yes
gemini-1.5-flash 1M Yes Yes
let config = ProviderConfig::new(
    ProviderKind::Gemini,
    "AIza...",
    "gemini-2.0-flash",
);

Ollama (Local)

Detail Value
Endpoint http://localhost:11434/api/chat
Auth Optional Bearer token
Streaming Newline-delimited JSON (NDJSON)

Models:

Model Context Tools Vision
llama3 8k No No
llama3.1 131k Yes No
codellama 16k No No
mistral 32k Yes No
mixtral 32k Yes No
llava 4k No Yes
let config = ProviderConfig::new(
    ProviderKind::Ollama,
    "", // No API key needed for local
    "llama3.1",
).with_base_url("http://localhost:11434");

OpenAI-Compatible Providers

For any service that implements the OpenAI API format. Factory functions are provided for popular services:

use saorsa_ai::openai_compat;

// Azure OpenAI
let provider = openai_compat::azure_openai(
    "your-api-key",
    "https://your-resource.openai.azure.com",
    "your-deployment",
    "2024-02-01",
)?;

// Groq
let provider = openai_compat::groq("gsk_...", "llama-3.1-70b-versatile")?;

// Mistral
let provider = openai_compat::mistral("your-key", "mistral-large-latest")?;

// OpenRouter
let provider = openai_compat::openrouter("sk-or-...", "anthropic/claude-3.5-sonnet")?;

// xAI (Grok)
let provider = openai_compat::xai("xai-...", "grok-2")?;

// Cerebras
let provider = openai_compat::cerebras("csk-...", "llama3.1-70b")?;

For custom endpoints, use the builder:

use saorsa_ai::openai_compat::OpenAiCompatProvider;

let provider = OpenAiCompatProvider::builder(config)
    .url_path("/v2/chat/completions")  // Custom API path
    .auth_header("X-Custom-Key")       // Custom auth header
    .extra_header("X-Project-Id", "my-project")
    .build()?;

Streaming

All providers return a unified stream of StreamEvent values via a tokio mpsc::Receiver:

let mut rx = provider.stream(request).await?;

while let Some(event) = rx.recv().await {
    match event? {
        StreamEvent::MessageStart { id, model, usage } => {
            // Stream started
        }
        StreamEvent::ContentBlockStart { index, content_block } => {
            // New content block (text or tool use)
        }
        StreamEvent::ContentBlockDelta { index, delta } => {
            match delta {
                ContentDelta::TextDelta { text } => {
                    // Incremental text
                }
                ContentDelta::InputJsonDelta { partial_json } => {
                    // Incremental tool call JSON
                }
            }
        }
        StreamEvent::ContentBlockStop { index } => {
            // Content block complete
        }
        StreamEvent::MessageDelta { stop_reason, usage } => {
            // Final metadata (stop reason, token usage)
        }
        StreamEvent::MessageStop => {
            // Stream complete
        }
        StreamEvent::Ping => {
            // Keepalive
        }
        StreamEvent::Error { message } => {
            // Stream error
        }
    }
}

Each provider translates its native streaming format (SSE or NDJSON) into the same event sequence. A background tokio task handles the parsing.

Tool Calling

Define tools using JSON Schema and handle tool use/result cycles:

use saorsa_ai::{
    CompletionRequest, ContentBlock, Message, StopReason, ToolDefinition,
};

// 1. Define a tool
let tool = ToolDefinition::new(
    "get_weather",
    "Get the current weather for a city",
    serde_json::json!({
        "type": "object",
        "properties": {
            "city": {
                "type": "string",
                "description": "City name"
            }
        },
        "required": ["city"]
    }),
);

// 2. Send request with tools
let request = CompletionRequest::new("claude-sonnet-4", messages, 1024)
    .tools(vec![tool]);

let response = provider.complete(request).await?;

// 3. Handle tool use
if response.stop_reason == Some(StopReason::ToolUse) {
    for block in &response.content {
        if let ContentBlock::ToolUse { id, name, input } = block {
            // Execute the tool (your logic here)
            let result = execute_tool(name, input);

            // 4. Send result back
            messages.push(Message::tool_result(id, result));
        }
    }

    // 5. Continue the conversation with tool results
    let followup = CompletionRequest::new("claude-sonnet-4", messages, 1024);
    let final_response = provider.complete(followup).await?;
}

Tool calling works identically across all providers - saorsa-ai handles the format translation between Anthropic's native tool blocks, OpenAI's function calling, Gemini's function declarations, and Ollama's format.

Model Registry

Look up model metadata at runtime:

use saorsa_ai::models;

// Exact match
if let Some(info) = models::lookup_model("gpt-4o") {
    println!("Context: {} tokens", info.context_window);
    println!("Tools: {}", info.supports_tools);
    println!("Vision: {}", info.supports_vision);
}

// Prefix match (for versioned model IDs)
let info = models::lookup_model_by_prefix("claude-sonnet-4-5-20250929");
// Matches "claude-sonnet-4"

// Individual queries
let ctx = models::get_context_window("gemini-1.5-pro"); // Some(2_000_000)
let tools = models::supports_tools("llama3");            // Some(false)
let vision = models::supports_vision("gpt-4o");          // Some(true)

Token Counting

Estimate token usage for context window management:

use saorsa_ai::tokens;

// Estimate tokens in text (~4 chars per token)
let count = tokens::estimate_tokens("Hello, world!");

// Estimate message tokens (includes per-message overhead)
let msg_tokens = tokens::estimate_message_tokens(&message);

// Estimate full conversation
let total = tokens::estimate_conversation_tokens(&messages, Some("system prompt"));

// Check if conversation fits within model's context
let fits = tokens::fits_in_context(
    &messages,
    Some("system prompt"),
    "claude-sonnet-4",
    4096, // max output tokens
);

Token counting is heuristic-based (~4 characters per token for English). For precise counts, use provider-specific tokenizers.

Error Handling

All operations return Result<T, SaorsaAiError>:

pub enum SaorsaAiError {
    /// Provider-specific error
    Provider { provider: String, message: String },
    /// Authentication failure (invalid or missing API key)
    Auth(String),
    /// Network error (connection, DNS, timeout)
    Network(String),
    /// Rate limit exceeded
    RateLimit(String),
    /// Invalid request parameters
    InvalidRequest(String),
    /// Streaming error
    Streaming(String),
    /// Token limit exceeded
    TokenLimit(String),
    /// JSON serialization/deserialization error
    Json(serde_json::Error),
    /// I/O error
    Io(std::io::Error),
    /// Internal error
    Internal(String),
}

Core Types Reference

Type Description
Provider Trait for non-streaming completions
StreamingProvider Trait extending Provider with streaming
ProviderConfig Configuration for creating a provider
ProviderKind Enum of provider types (Anthropic, OpenAi, Gemini, Ollama, OpenAiCompatible)
ProviderRegistry Factory for creating providers from config
CompletionRequest Builder for completion requests
CompletionResponse Parsed completion response
Message Conversation message (user, assistant, tool result)
Role Message role (User, Assistant)
ContentBlock Message content (Text, ToolUse, ToolResult)
ContentDelta Streaming delta (TextDelta, InputJsonDelta)
StreamEvent Streaming event (message start/stop, content deltas, errors)
StopReason Why generation stopped (EndTurn, MaxTokens, StopSequence, ToolUse)
Usage Token usage (input_tokens, output_tokens)
ToolDefinition Tool schema for function calling
ModelInfo Model metadata (context window, capabilities)

Dependencies

Crate Purpose
reqwest HTTP client (rustls-tls)
reqwest-eventsource Server-Sent Events parsing
tokio Async runtime
futures Async stream utilities
async-trait Async trait support
serde / serde_json JSON serialization
tracing Structured logging
thiserror Error type derivation

Minimum Supported Rust Version

The MSRV is 1.88 (Rust Edition 2024). This is enforced in CI.

License

Licensed under either of:

at your option.

Contributing

Part of the saorsa-tui workspace. See the workspace root for contribution guidelines.

Dependencies

~12–45MB
~622K SLoC