#apple-silicon #inference #nng #ipc #llm-inference #llm

orchard-rs

Rust client for Orchard - high-performance LLM inference on Apple Silicon

13 stable releases

new 2026.3.7 Mar 14, 2026
2026.3.5 Mar 12, 2026
2026.2.4 Feb 14, 2026
2026.1.4 Jan 27, 2026
2025.12.2 Dec 27, 2025

#982 in Asynchronous

Apache-2.0

505KB
8K SLoC

orchard-rs

Rust client for Orchard - high-performance LLM inference on Apple Silicon.

Installation

[dependencies]
orchard-rs = "2025.12"

Usage

use orchard::{IPCClient, RequestOptions};

#[tokio::main]
async fn main() -> Result<(), orchard::Error> {
    // Connect to PIE (Proxy Inference Engine)
    let mut client = IPCClient::new();
    client.connect()?;

    // Send inference request
    let request_id = client.next_request_id();
    let mut stream = client.send_request(
        request_id,
        "qwen-2.5-coder-32b",
        "/path/to/model",
        "Explain quantum computing in simple terms.",
        RequestOptions {
            max_tokens: 500,
            temperature: 0.7,
            ..Default::default()
        },
    )?;

    // Stream response tokens
    while let Some(delta) = stream.recv().await {
        if let Some(content) = delta.content {
            print!("{}", content);
        }
        if delta.is_final_delta {
            println!();
            break;
        }
    }

    client.disconnect();
    Ok(())
}

Features

  • High-performance IPC - NNG (nanomsg-next-gen) for minimal latency
  • Streaming responses - Async token streaming via tokio channels
  • Thread-safe - Lock-based design for concurrent access
  • Wire-compatible - Same binary protocol as orchard-py and orchard-swift

Requirements

  • Rust 1.70+
  • PIE (Proxy Inference Engine) running locally
  • macOS 14+ (Apple Silicon)

Model Profiles

Chat templates and control tokens are loaded from the Pantheon submodule at profiles/. This provides a single source of truth shared across all Orchard SDKs (Python, Rust, Swift).

License

Apache-2.0

Dependencies

~32–55MB
~1M SLoC