#ollama #image #embedding #completion #streaming

ollama-native

A minimalist Ollama Rust SDK that provides the most basic functionality for interacting with Ollama

3 stable releases

Uses new Rust 2024

1.0.2 Mar 4, 2025
1.0.1 Mar 3, 2025
1.0.0 Mar 2, 2025

#96 in Machine learning

Download history 377/week @ 2025-03-02 27/week @ 2025-03-09 5/week @ 2025-03-16

409 downloads per month

MIT license

145KB
2.5K SLoC

ollama-native 🐑

GitHub Actions Workflow Status GitHub Release Crates.io Version Total Downloads GitHub License

ollama-native is a minimalist Ollama Rust SDK that provides the most basic functionality for interacting with Ollama.

Goals 🎯

  • ✅ Provide access to the core Ollama API functions for interacting with models.
  • ❌ The project does not include any business-specific functionality like chat with history.

[!TIP] For users who need features like chat with history, these functionalities can be implemented at the business layer of your application (chat-with-history-example). Alternatively, you may choose to use other Ollama SDKs that provide these higher-level features.

Usage 🔦

Add dependencies

cargo add ollama-native

Generate a Completion

use ollama_native::Ollama;

let ollama = Ollama::new("http://localhost:11434");

let response = ollama
    .generate("llama3.1:8b")
    .prompt("Tell me a joke about sharks")
    .seed(5)
    .temperature(3.2)
    .await?;

Generate Request (Streaming)

Add stream feature:

cargo add ollama-native --features stream
use ollama_native::{Ollama, action::IntoStream};
use tokio::io::AsyncWriteExt;
use tokio_stream::StreamExt;

let ollama = Ollama::new("http://localhost:11434")

let mut stream = ollama
    .generate("llama3.1:8b")
    .prompt("Tell me a joke about sharks")
    .stream()
    .await?;

let mut out = tokio::io::stdout();
while let Some(Ok(item)) = stream.next().await {
    out.write(item.response.as_bytes()).await?;
    out.flush().await?;
}

Structured Ouput

[!TIP] See structured outputs example for more details.

JSON Mode

// JSON mode
let resposne = ollama
    .generate("llama3.1:8b")
    .prompt("Ollama is 22 years old and is busy saving the world.")
    .json() // Get the response in JSON format.
    .await?;

Specified JSON Format

let format = r#"
{
    "type": "object",
    "properties": {
        "age": {
            "type": "integer"
        },
        "available": {
            "type": "boolean"
        }
    },
    "required": [
        "age",
        "available"
    ]
}"#;

let resposne = ollama
    .generate("llama3.1:8b")
    .prompt("Ollama is 22 years old and is busy saving the world.")
    .format(format)
    .await?;

API Design 🧬

  • Minimal Functionality: Offers the core functionalities of Ollama without extra features or complexity.
  • Rusty Style: Utilizes chainable methods, making the API simple, concise, and idiomatic to Rust.
  • Fluent Response: Responses are automatically converted to the appropriate data structure based on the methods you call.
  • Unified APIs: Uses a consistent API for both streaming and non-streaming requests.
Chaining Methods
// Multiple-step request construction.
let options = OptionsBuilder::new()
    .stop("stop")
    .num_predict(42)
    .seed(42)
    .build();

let request = GenerateCompletionRequestBuilder::new()
    .model("llama3.1:8b")
    .prompt("Tell me a joke")
    .options(options)
    .build();

let response = ollama.generate(request).await?;
// Using method chaining to build requests.
let response = ollama
    .generate("llama3.1:8b")
    .prompt("Tell me a joke")
    .stop("stop")
    .num_predict(42)
    .seed(42)
    .await?;
Fluent Response
// Unload a model from memory.
let request = GenerateCompletionRequestBuilder::new()
    .model("llama3.1:8b")
    .keep_alive(0)
    .build();

// Unnecessary fields will be returned.
let response = ollama.generate(request).await?;
/*
{
  "model": "llama3.1:8b",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "response": "",
  "done": true,
  "done_reason": "unload",
  "context": None,
  "total_duration": None,
  "load_duration": None,
  "prompt_eval_count": None,
  "prompt_eval_duration": None,
  "eval_count": None,
  "eval_duration": None
}
*/
// Return type will be converted automatically if `unload` is called, 
// avoiding unnecessary parameters handling.
let response = ollama.generate("llama3.1:8b").unload().await?;
/*
{
  "model": "llama3.1:8b",
  "created_at": "2023-12-18T19:52:07.071755Z",
  "response": "",
  "done": true,
  "done_reason": "unload"
}
*/
Unified APIs
// Using a different API to implement streaming response.
let options = OptionsBuilder::new()
    .stop("stop")
    .num_predict(42)
    .seed(42)
    .build();

let request = GenerateStreamRequestBuilder::new()
    .model("llama3.1:8b")
    .prompt("Tell me a joke")
    .options(options)
    .build();


let stream = ollama.generate_stream(request).await?;
// Using the same API as non-streaming to implement streaming response.
let stream = ollama
    .generate("llama3.1:8b")
    .prompt("Tell me a joke")
    .stop("stop")
    .num_predict(42)
    .seed(42)
    .stream() // Specify streaming response.
    .await?;

APIs 📝

  • Generate a completion
  • Generate a chat completion
  • Create a Model
  • List Local Models
  • Show Model Information
  • Delete a Model
  • Pull a Model
  • Push a Model
  • Generate Embeddings
  • List Running Models
  • Version
  • Check if a Blob Exists
  • Push a Blob

Examples 📖

License ⚖️

This project is licensed under the MIT license.

Acknowledgments 🎉

Thanks mongodb for providing such an elegant design pattern.

Isabel Atkinson: “Rustify Your API: A Journey from Specification to Implementation” | RustConf 2024

Dependencies

~6–18MB
~232K SLoC