9 releases (breaking)
Uses new Rust 2024
| 0.7.0 | Feb 20, 2026 |
|---|---|
| 0.6.0 | Feb 8, 2026 |
| 0.5.0 | Feb 8, 2026 |
| 0.4.0 | Feb 7, 2026 |
| 0.0.1 | Feb 2, 2026 |
#590 in Asynchronous
Used in 3 crates
620KB
12K
SLoC
llm-stack
Production-ready Rust SDK for LLM providers
Quick Start • Features • Documentation • Examples • Contributing
Overview
llm-stack is a unified Rust interface for building LLM-powered applications. Write against one set of types, swap providers without changing application code.
use llm_stack::{ChatMessage, ChatParams, Provider};
use llm_stack_anthropic::AnthropicProvider;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let provider = AnthropicProvider::from_env()?;
let response = provider.generate(&ChatParams {
messages: vec![ChatMessage::user("What is Rust's ownership model?")],
max_tokens: Some(1024),
..Default::default()
}).await?;
println!("{}", response.text().unwrap_or_default());
Ok(())
}
Why llm-stack?
- 🔌 Provider Agnostic — Same code works with Anthropic, OpenAI, Ollama, or any custom provider
- 🛠️ Batteries Included — Tool execution, structured output, streaming, retry logic out of the box
- 🎯 Type Safe —
generate_object::<T>()returnsT, notserde_json::Value - ⚡ Production Ready — Comprehensive error handling, token tracking, cost calculation
- 🧪 Testable —
MockProviderfor unit tests, no network mocks needed - 🦀 Rust Native — Async/await, strong typing, zero-cost abstractions
Quick Start
Add to your Cargo.toml:
[dependencies]
llm-stack = "0.1"
llm-stack-anthropic = "0.1" # or llm-stack-openai, llm-stack-ollama
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
Set your API key:
export ANTHROPIC_API_KEY="sk-ant-..."
# or OPENAI_API_KEY for OpenAI
Run:
use llm_stack::{ChatMessage, ChatParams, Provider};
use llm_stack_anthropic::AnthropicProvider;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let provider = AnthropicProvider::from_env()?;
let response = provider.generate(&ChatParams {
messages: vec![ChatMessage::user("Hello!")],
..Default::default()
}).await?;
println!("{}", response.text().unwrap_or_default());
println!("Tokens: {} in, {} out",
response.usage.input_tokens,
response.usage.output_tokens);
Ok(())
}
Features
Core Capabilities
| Feature | Description |
|---|---|
| Unified Provider Trait | Two methods define a provider: generate() + stream() |
| Streaming | First-class async streaming with StreamEvent types |
| Tool Execution | Register handlers, validate inputs, execute in agentic loops |
| Resumable Tool Loop | Caller-driven iteration with inspect/inject/stop between rounds |
| Structured Output | generate_object::<T>() with JSON Schema validation |
| Interceptors | Composable retry, timeout, logging, approval gates |
| Usage Tracking | Token counts, cost calculation in microdollars |
| Context Management | Token budget, compaction, emergency truncation, observation masking |
| Result Pipeline | Structural pruning → semantic extraction → disk-backed caching |
Provider Support
| Provider | Crate | Models |
|---|---|---|
| Anthropic | llm-stack-anthropic |
Claude 3.5 Sonnet, Claude 3 Opus/Haiku |
| OpenAI | llm-stack-openai |
GPT-4o, GPT-4 Turbo, GPT-3.5 |
| Ollama | llm-stack-ollama |
Llama 3, Mistral, CodeLlama, any local model |
Tool Execution Engine
Build agentic applications with the tool loop:
use llm_stack::{
ChatParams, ChatMessage, ToolRegistry, ToolLoopConfig,
tool::{tool_fn, tool_loop},
ToolDefinition, JsonSchema,
};
use serde_json::json;
// Define a tool
let mut registry: ToolRegistry<()> = ToolRegistry::new();
registry.register(tool_fn(
ToolDefinition {
name: "get_weather".into(),
description: "Get current weather for a city".into(),
parameters: JsonSchema::new(json!({
"type": "object",
"properties": {
"city": { "type": "string" }
},
"required": ["city"]
})),
..Default::default()
},
|input| async move {
let city = input["city"].as_str().unwrap_or("unknown");
Ok(format!("Weather in {city}: 72°F, sunny"))
},
));
// Run the agentic loop
let result = tool_loop(
&provider,
®istry,
ChatParams {
messages: vec![ChatMessage::user("What's the weather in Tokyo?")],
tools: Some(registry.definitions()),
..Default::default()
},
ToolLoopConfig::default(),
&(),
).await?;
println!("Final answer: {}", result.response.text().unwrap_or_default());
Resumable Tool Loop
For orchestration patterns that need control between iterations (multi-agent, event injection, context compaction):
use llm_stack::{ToolLoopHandle, TurnResult};
let mut handle = ToolLoopHandle::new(
&provider, ®istry, params, ToolLoopConfig::default(), &(),
);
loop {
match handle.next_turn().await {
TurnResult::Yielded(turn) => {
// Text and tool results are directly available
if let Some(text) = turn.assistant_text() {
println!("LLM said: {text}");
}
turn.continue_loop(); // or inject_and_continue(), stop()
}
TurnResult::Completed(done) => {
println!("Done: {}", done.response.text().unwrap_or_default());
break;
}
TurnResult::Error(err) => {
eprintln!("Error: {}", err.error);
break;
}
}
}
See Tool documentation for the full API.
Structured Output
Get typed responses with schema validation:
use llm_stack::structured::generate_object;
use serde::Deserialize;
#[derive(Deserialize, schemars::JsonSchema)]
struct MovieReview {
title: String,
rating: u8,
summary: String,
}
let review: MovieReview = generate_object(
&provider,
&ChatParams {
messages: vec![ChatMessage::user("Review the movie Inception")],
..Default::default()
},
).await?;
println!("{}: {}/10 - {}", review.title, review.rating, review.summary);
Interceptors
Add cross-cutting concerns without modifying provider code:
use llm_stack::intercept::{InterceptorStack, Retry, Timeout, Logging};
use std::time::Duration;
let registry = ToolRegistry::new()
.with_interceptors(
InterceptorStack::new()
.with(Retry::new(3, Duration::from_millis(100)))
.with(Timeout::new(Duration::from_secs(30)))
.with(Logging::new())
);
Documentation
| Guide | Description |
|---|---|
| Quick Start | Get up and running in 5 minutes |
| Architecture | Design principles and module overview |
| Providers | Provider configuration and selection |
| Tools | Tool execution and agentic loops |
| Structured Output | Type-safe LLM responses |
| Interceptors | Retry, timeout, logging, approval |
| Context Window | Token management and truncation |
| Migration Guide | Coming from the llm crate? |
API Reference
cargo doc --open
Or view on docs.rs.
Examples
Streaming
use futures::StreamExt;
use llm_stack::stream::StreamEvent;
let mut stream = provider.stream(¶ms).await?;
while let Some(event) = stream.next().await {
match event? {
StreamEvent::TextDelta(text) => print!("{text}"),
StreamEvent::Done { stop_reason } => break,
_ => {}
}
}
Multi-Provider Setup
use llm_stack::{ProviderRegistry, ProviderConfig};
let registry = ProviderRegistry::new()
.register("claude", AnthropicProvider::from_env()?)
.register("gpt4", OpenAiProvider::from_env()?)
.register("local", OllamaProvider::new("http://localhost:11434"));
// Select at runtime
let provider = registry.get("claude")?;
Testing with MockProvider
use llm_stack::test_helpers::mock_for;
#[tokio::test]
async fn test_my_agent() {
let mock = mock_for("test", "mock-model");
mock.queue_response(ChatResponse {
content: vec![ContentBlock::Text("Hello!".into())],
..Default::default()
});
let response = mock.generate(¶ms).await.unwrap();
assert_eq!(response.text(), Some("Hello!"));
}
Crate Map
| Crate | Purpose |
|---|---|
llm-stack |
Traits, types, errors, streaming, tools, interceptors |
llm-stack-anthropic |
Anthropic Claude provider |
llm-stack-openai |
OpenAI GPT provider |
llm-stack-ollama |
Ollama local provider |
Development
Prerequisites
- Rust 1.85+ (2024 edition)
- just — Command runner
Commands
just gate # Full CI check: fmt + clippy + test + doc
just test # Run all tests
just clippy # Lint with warnings as errors
just doc # Build documentation
just fcheck # Quick feedback: fmt + check
Running Tests
# All tests
just test
# Specific test
just test-one test_tool_loop
# With output
just test-verbose
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Quick Checklist
-
just gatepasses (fmt, clippy, tests, docs) - New features have tests
- Public APIs have documentation
- Commit messages follow Conventional Commits
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Acknowledgments
Built on the shoulders of giants:
- Tokio — Async runtime
- reqwest — HTTP client
- serde — Serialization
- jsonschema — JSON Schema validation
Dependencies
~8–27MB
~318K SLoC