3 stable releases

Uses new Rust 2024

1.0.2	Mar 4, 2025
1.0.1	Mar 3, 2025
1.0.0	Mar 2, 2025

#185 in Machine learning

MIT license

145KB
2.5K SLoC

ollama-native 🐑

ollama-native is a minimalist Ollama Rust SDK that provides the most basic functionality for interacting with Ollama.

Goals 🎯

✅ Provide access to the core Ollama API functions for interacting with models.
❌ The project does not include any business-specific functionality like chat with history.

[!TIP] For users who need features like chat with history, these functionalities can be implemented at the business layer of your application (chat-with-history-example). Alternatively, you may choose to use other Ollama SDKs that provide these higher-level features.

Usage 🔦

Add dependencies

cargo add ollama-native

Generate a Completion

use ollama_native::Ollama;

let ollama = Ollama::new("http://localhost:11434");

let response = ollama
    .generate("llama3.1:8b")
    .prompt("Tell me a joke about sharks")
    .seed(5)
    .temperature(3.2)
    .await?;

Generate Request (Streaming)

Add stream feature:

cargo add ollama-native --features stream

use ollama_native::{Ollama, action::IntoStream};
use tokio::io::AsyncWriteExt;
use tokio_stream::StreamExt;

let ollama = Ollama::new("http://localhost:11434")

let mut stream = ollama
    .generate("llama3.1:8b")
    .prompt("Tell me a joke about sharks")
    .stream()
    .await?;

let mut out = tokio::io::stdout();
while let Some(Ok(item)) = stream.next().await {
    out.write(item.response.as_bytes()).await?;
    out.flush().await?;
}

Structured Ouput

[!TIP] See structured outputs example for more details.

JSON Mode

// JSON mode
let resposne = ollama
    .generate("llama3.1:8b")
    .prompt("Ollama is 22 years old and is busy saving the world.")
    .json() // Get the response in JSON format.
    .await?;

Specified JSON Format

let format = r#"
{
    "type": "object",
    "properties": {
        "age": {
            "type": "integer"
        },
        "available": {
            "type": "boolean"
        }
    },
    "required": [
        "age",
        "available"
    ]
}"#;

let resposne = ollama
    .generate("llama3.1:8b")
    .prompt("Ollama is 22 years old and is busy saving the world.")
    .format(format)
    .await?;

API Design 🧬

Minimal Functionality: Offers the core functionalities of Ollama without extra features or complexity.
Rusty Style: Utilizes chainable methods, making the API simple, concise, and idiomatic to Rust.
Fluent Response: Responses are automatically converted to the appropriate data structure based on the methods you call.
Unified APIs: Uses a consistent API for both streaming and non-streaming requests.

❌ ✅

Chaining Methods

	❌	✅
Chaining Methods	`// Multiple-step request construction. let options = OptionsBuilder::new() .stop("stop") .num_predict(42) .seed(42) .build(); let request = GenerateCompletionRequestBuilder::new() .model("llama3.1:8b") .prompt("Tell me a joke") .options(options) .build(); let response = ollama.generate(request).await?;`	`// Using method chaining to build requests. let response = ollama .generate("llama3.1:8b") .prompt("Tell me a joke") .stop("stop") .num_predict(42) .seed(42) .await?;`
Fluent Response	// Unload a model from memory. let request = GenerateCompletionRequestBuilder::new() .model("llama3.1:8b") .keep_alive(0) .build(); // Unnecessary fields will be returned. let response = ollama.generate(request).await?; /* { "model": "llama3.1:8b", "created_at": "2023-08-04T19:22:45.499127Z", "response": "", "done": true, "done_reason": "unload", "context": None, "total_duration": None, "load_duration": None, "prompt_eval_count": None, "prompt_eval_duration": None, "eval_count": None, "eval_duration": None } */	// Return type will be converted automatically if `unload` is called, // avoiding unnecessary parameters handling. let response = ollama.generate("llama3.1:8b").unload().await?; /* { "model": "llama3.1:8b", "created_at": "2023-12-18T19:52:07.071755Z", "response": "", "done": true, "done_reason": "unload" } */
Unified APIs	`// Using a different API to implement streaming response. let options = OptionsBuilder::new() .stop("stop") .num_predict(42) .seed(42) .build(); let request = GenerateStreamRequestBuilder::new() .model("llama3.1:8b") .prompt("Tell me a joke") .options(options) .build(); let stream = ollama.generate_stream(request).await?;`	`// Using the same API as non-streaming to implement streaming response. let stream = ollama .generate("llama3.1:8b") .prompt("Tell me a joke") .stop("stop") .num_predict(42) .seed(42) .stream() // Specify streaming response. .await?;`

// Multiple-step request construction.
let options = OptionsBuilder::new()
    .stop("stop")
    .num_predict(42)
    .seed(42)
    .build();

let request = GenerateCompletionRequestBuilder::new()
    .model("llama3.1:8b")
    .prompt("Tell me a joke")
    .options(options)
    .build();

let response = ollama.generate(request).await?;

// Using method chaining to build requests.
let response = ollama
    .generate("llama3.1:8b")
    .prompt("Tell me a joke")
    .stop("stop")
    .num_predict(42)
    .seed(42)
    .await?;

Fluent Response

// Unload a model from memory.
let request = GenerateCompletionRequestBuilder::new()
    .model("llama3.1:8b")
    .keep_alive(0)
    .build();

// Unnecessary fields will be returned.
let response = ollama.generate(request).await?;
/*
{
  "model": "llama3.1:8b",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "response": "",
  "done": true,
  "done_reason": "unload",
  "context": None,
  "total_duration": None,
  "load_duration": None,
  "prompt_eval_count": None,
  "prompt_eval_duration": None,
  "eval_count": None,
  "eval_duration": None
}
*/

// Return type will be converted automatically if `unload` is called, 
// avoiding unnecessary parameters handling.
let response = ollama.generate("llama3.1:8b").unload().await?;
/*
{
  "model": "llama3.1:8b",
  "created_at": "2023-12-18T19:52:07.071755Z",
  "response": "",
  "done": true,
  "done_reason": "unload"
}
*/

Unified APIs

// Using a different API to implement streaming response.
let options = OptionsBuilder::new()
    .stop("stop")
    .num_predict(42)
    .seed(42)
    .build();

let request = GenerateStreamRequestBuilder::new()
    .model("llama3.1:8b")
    .prompt("Tell me a joke")
    .options(options)
    .build();


let stream = ollama.generate_stream(request).await?;

// Using the same API as non-streaming to implement streaming response.
let stream = ollama
    .generate("llama3.1:8b")
    .prompt("Tell me a joke")
    .stop("stop")
    .num_predict(42)
    .seed(42)
    .stream() // Specify streaming response.
    .await?;

APIs 📝

Generate a completion
Generate a chat completion
Create a Model
List Local Models
Show Model Information
Delete a Model
Pull a Model
Push a Model
Generate Embeddings
List Running Models
Version
Check if a Blob Exists
Push a Blob

Examples 📖

License ⚖️

This project is licensed under the MIT license.

Acknowledgments 🎉

Thanks mongodb for providing such an elegant design pattern.

Isabel Atkinson: “Rustify Your API: A Journey from Specification to Implementation” | RustConf 2024

Dependencies

~7–18MB
~233K SLoC