#llama #ollama #ollama-api #api-client #llm

bin+lib ollama-oxide

A Rust library for integrating with Ollama's native API, providing low-level inference and high-level conveniences

5 unstable releases

Uses new Rust 2024

0.2.0 Apr 4, 2026
0.1.2 Feb 15, 2026
0.1.1 Feb 15, 2026
0.1.0 Feb 14, 2026
0.0.1 Jan 12, 2026

#394 in Asynchronous

Download history 73/week @ 2026-04-06 162/week @ 2026-04-13

235 downloads per month

MIT license

325KB
5K SLoC

ollama-oxide

ollama-oxide Logo

Crates.io Documentation License: MIT Rust

The Llama in the Crate is a Rust library providing low-level primitives and high-level conveniences for integrating with [Ollama](https://github.com/ollama)'s native API.
Llama in the crate

Features

  • Low-level primitives for direct Ollama API interaction
  • High-level conveniences (optional) for common use cases
  • Async/await support with Tokio runtime
  • Type-safe API bindings generated from OpenAPI specs
  • Comprehensive error handling
  • HTTP/2 support via reqwest
  • Feature flags for modular dependencies
  • Streaming chatPOST /api/chat as NDJSON via chat_stream / chat_stream_blocking (see examples chat_stream_async, chat_stream_sync; thinking models: chat_stream_think_async, chat_stream_think_sync)

Architecture

Single-crate design with modular structure and feature flags:

ollama-oxide/
└── src/
    ├── lib.rs           # Main library entry point
    ├── inference/       # Inference types: chat, generate, embed (default)
    ├── http/            # HTTP client layer (default)
    ├── tools/           # Ergonomic function calling (optional)
    ├── model/           # Model management (optional)
    └── conveniences/    # High-level APIs (optional)

Feature Flags

The library uses feature flags to let you include only what you need:

Feature Dependencies Purpose
default http, inference Standard usage - HTTP client + all inference types
inference - Standalone inference types (chat, generate, embed)
http - HTTP client implementation (async/sync)
tools schemars, futures Ergonomic function calling with auto-generated JSON schemas
model http, inference Model management API (list, show, copy, create, delete)
conveniences http, inference High-level ergonomic APIs

Installation

Add this to your Cargo.toml:

# Default features (inference + http)
[dependencies]
ollama-oxide = "0.2.0"

# With function calling support
[dependencies]
ollama-oxide = { version = "0.2.0", features = ["tools"] }

# With model management
[dependencies]
ollama-oxide = { version = "0.2.0", features = ["model"] }

# Full featured
[dependencies]
ollama-oxide = { version = "0.2.0", features = ["tools", "model"] }

# Inference types only (no HTTP client)
[dependencies]
ollama-oxide = { version = "0.2.0", default-features = false, features = ["inference"] }

Quick Start

#[tokio::main]
fn main() -> Result<(), Box<dyn std::error::Error>> {
    todo!("Working ");
}

Requirements

  • Rust 1.75+ (edition 2024)
  • Ollama running locally or accessible via network

Development

Building

cargo build

Running Tests

cargo test

Running Examples

cargo run --example basic_generation

Streaming chat (requires a running Ollama server):

cargo run --example chat_stream_async
cargo run --example chat_stream_sync

API Documentation

The library follows Ollama's OpenAPI specifications (see spec/primitives/).

12 Total Endpoints:

  • 5 Simple endpoints (version, tags, ps, copy, delete)
  • 2 Medium complexity (show, embed)
  • 5 Complex endpoints where Ollama supports streaming modes (generate, chat, create, pull, push) — chat NDJSON streaming is implemented in this crate; other streaming modes may follow in later releases

See spec/api-analysis.md for detailed endpoint documentation.

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Based on Ollama's official libraries and API specifications.

Dependencies

~6–22MB
~258K SLoC