5 unstable releases
Uses new Rust 2024
| 0.2.0 | Apr 4, 2026 |
|---|---|
| 0.1.2 | Feb 15, 2026 |
| 0.1.1 | Feb 15, 2026 |
| 0.1.0 | Feb 14, 2026 |
| 0.0.1 | Jan 12, 2026 |
#394 in Asynchronous
235 downloads per month
325KB
5K
SLoC
ollama-oxide
The Llama in the Crate is a Rust library providing low-level primitives and high-level conveniences for integrating with [Ollama](https://github.com/ollama)'s native API.
Features
- Low-level primitives for direct Ollama API interaction
- High-level conveniences (optional) for common use cases
- Async/await support with Tokio runtime
- Type-safe API bindings generated from OpenAPI specs
- Comprehensive error handling
- HTTP/2 support via reqwest
- Feature flags for modular dependencies
- Streaming chat —
POST /api/chatas NDJSON viachat_stream/chat_stream_blocking(see exampleschat_stream_async,chat_stream_sync; thinking models:chat_stream_think_async,chat_stream_think_sync)
Architecture
Single-crate design with modular structure and feature flags:
ollama-oxide/
└── src/
├── lib.rs # Main library entry point
├── inference/ # Inference types: chat, generate, embed (default)
├── http/ # HTTP client layer (default)
├── tools/ # Ergonomic function calling (optional)
├── model/ # Model management (optional)
└── conveniences/ # High-level APIs (optional)
Feature Flags
The library uses feature flags to let you include only what you need:
| Feature | Dependencies | Purpose |
|---|---|---|
default |
http, inference |
Standard usage - HTTP client + all inference types |
inference |
- | Standalone inference types (chat, generate, embed) |
http |
- | HTTP client implementation (async/sync) |
tools |
schemars, futures |
Ergonomic function calling with auto-generated JSON schemas |
model |
http, inference |
Model management API (list, show, copy, create, delete) |
conveniences |
http, inference |
High-level ergonomic APIs |
Installation
Add this to your Cargo.toml:
# Default features (inference + http)
[dependencies]
ollama-oxide = "0.2.0"
# With function calling support
[dependencies]
ollama-oxide = { version = "0.2.0", features = ["tools"] }
# With model management
[dependencies]
ollama-oxide = { version = "0.2.0", features = ["model"] }
# Full featured
[dependencies]
ollama-oxide = { version = "0.2.0", features = ["tools", "model"] }
# Inference types only (no HTTP client)
[dependencies]
ollama-oxide = { version = "0.2.0", default-features = false, features = ["inference"] }
Quick Start
#[tokio::main]
fn main() -> Result<(), Box<dyn std::error::Error>> {
todo!("Working ");
}
Requirements
- Rust 1.75+ (edition 2024)
- Ollama running locally or accessible via network
Development
Building
cargo build
Running Tests
cargo test
Running Examples
cargo run --example basic_generation
Streaming chat (requires a running Ollama server):
cargo run --example chat_stream_async
cargo run --example chat_stream_sync
API Documentation
The library follows Ollama's OpenAPI specifications (see spec/primitives/).
12 Total Endpoints:
- 5 Simple endpoints (version, tags, ps, copy, delete)
- 2 Medium complexity (show, embed)
- 5 Complex endpoints where Ollama supports streaming modes (generate, chat, create, pull, push) — chat NDJSON streaming is implemented in this crate; other streaming modes may follow in later releases
See spec/api-analysis.md for detailed endpoint documentation.
Contributing
Contributions are welcome! Please read CONTRIBUTING.md for guidelines.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
Based on Ollama's official libraries and API specifications.
Links
Dependencies
~6–22MB
~258K SLoC