#triton #inference #api-bindings

truston

A high-performance Rust client library for NVIDIA Triton Inference Server

2 releases

Uses new Rust 2024

0.1.1 Oct 1, 2025
0.1.0 Oct 1, 2025

#909 in HTTP server

Download history 267/week @ 2025-09-30 23/week @ 2025-10-07 16/week @ 2025-10-14 4/week @ 2025-10-21

71 downloads per month

MIT license

62KB
800 lines

Truston

Crates.io Documentation License

A high-performance Rust client library for NVIDIA Triton Inference Server.

Truston provides a type-safe, ergonomic interface for communicating with Triton Inference Server via its REST API, supporting multiple data types, seamless NDArray integration, and async operations.

Features

  • 🚀 Type-safe inference - Strongly-typed input/output handling with compile-time guarantees
  • 🎯 Multiple data types - Support for all Triton data types (INT8-64, UINT8-64, FP32/64, BOOL, STRING, BF16)
  • 🔢 NDArray integration - Direct conversion between ndarray::ArrayD and Triton tensors
  • Async/await - Built on tokio for efficient concurrent operations
  • 🛡️ Error handling - Comprehensive error types with context
  • 📊 Production-ready - Includes logging, tracing, and comprehensive tests

Installation

Add this to your Cargo.toml:

[dependencies]
truston = "0.1.0"
tokio = { version = "1.47", features = ["full"] }
ndarray = "0.16"

Quick Start

use truston::client::triton_client::TritonRestClient;
use truston::client::io::InferInput;
use ndarray::ArrayD;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a client
    let client = TritonRestClient::new("http://localhost:8000");
    
    // Check if server is alive
    client.is_server_live().await?;
    
    // Prepare input data
    let input_data: ArrayD<f32> = ArrayD::zeros(ndarray::IxDyn(&[1, 224, 224, 3]));
    let input = InferInput::from_ndarray("input", input_data);
    
    // Run inference
    let results = client.infer(vec![input], "resnet50").await?;
    
    // Access results
    for output in results.outputs {
        println!("Output: {} with shape {:?}", output.name, output.shape);
        
        if let Some(vec) = output.data.as_f32_vec() {
            println!("First 5 values: {:?}", &vec[..5]);
        }
    }
    
    Ok(())
}

Usage Examples

Creating Inputs from NDArray

use truston::client::io::InferInput;
use ndarray::array;

// From a 2D array
let arr = array![[1.0, 2.0], [3.0, 4.0]].into_dyn();
let input = InferInput::from_ndarray("my_input", arr);

Creating Inputs from Raw Vectors

use truston::client::io::{InferInput, DataType};

// For float32 data
let data = DataType::F32(vec![1.0, 2.0, 3.0, 4.0]);
let input = InferInput::new(
    "my_input".to_string(),
    vec![2, 2], // shape
    data
);

// For int64 data
let data = DataType::I64(vec![1, 2, 3, 4, 5, 6]);
let input = InferInput::new(
    "input_ids".to_string(),
    vec![1, 6],
    data
);

Multi-Input Models

use truston::client::triton_client::TritonRestClient;
use truston::client::io::InferInput;
use ndarray::ArrayD;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = TritonRestClient::new("http://localhost:8000");
    
    // BERT-style model with multiple inputs
    let input_ids: ArrayD<i64> = ArrayD::zeros(ndarray::IxDyn(&[1, 128]));
    let attention_mask: ArrayD<i64> = ArrayD::ones(ndarray::IxDyn(&[1, 128]));
    
    let inputs = vec![
        InferInput::from_ndarray("input_ids", input_ids),
        InferInput::from_ndarray("attention_mask", attention_mask),
    ];
    
    let results = client.infer(inputs, "bert_model").await?;
    
    Ok(())
}

Handling Outputs

// Convert to vector
if let Some(vec) = output.data.as_f32_vec() {
    println!("F32 output: {:?}", vec);
}

// Convert to ndarray
if let Some(arr) = output.data.to_ndarray_f32(&output.shape) {
    println!("Array shape: {:?}", arr.shape());
    println!("Max value: {:?}", arr.iter().cloned().fold(f32::NEG_INFINITY, f32::max));
}

// Handle different types
match &output.data {
    DataType::F32(v) => println!("Float32: {} values", v.len()),
    DataType::I64(v) => println!("Int64: {} values", v.len()),
    DataType::String(v) => println!("Strings: {:?}", v),
    _ => println!("Other type"),
}

Error Handling

use truston::utils::errors::TrustonError;

match client.infer(inputs, "my_model").await {
    Ok(results) => {
        println!("Success! Got {} outputs", results.outputs.len());
    }
    Err(TrustonError::Http(msg)) => {
        eprintln!("Connection error: {}", msg);
    }
    Err(TrustonError::HttpErrorResponse(code, body)) => {
        eprintln!("Server error {}: {}", code, body);
    }
    Err(TrustonError::InferRequestError(msg)) => {
        eprintln!("Inference failed: {}", msg);
    }
    Err(TrustonError::InferParseError(msg)) => {
        eprintln!("Parse error: {}", msg);
    }
}

Supported Data Types

Rust Type Triton Type DataType Variant
bool BOOL DataType::Bool
u8 UINT8 DataType::U8
u16 UINT16 DataType::U16
u64 UINT64 DataType::U64
i8 INT8 DataType::I8
i16 INT16 DataType::I16
i32 INT32 DataType::I32
i64 INT64 DataType::I64
f32 FP32 DataType::F32
f64 FP64 DataType::F64
String STRING DataType::String
u16 (raw) BF16 DataType::Bf16

Requirements

  • Rust 1.70 or later
  • Triton Inference Server (any version supporting v2 REST API)

Running Tests

Some tests require a running Triton server:

# Run all tests except integration tests
cargo test

# Run integration tests (requires Triton server at localhost:50000)
cargo test -- --ignored

Examples

Check the examples/ directory for more usage examples:

# Check server connection
cargo run --example client_connect

# Run inference test
cargo run --example infer_test

# NDArray examples
cargo run --example ndarray_coba

Documentation

For detailed API documentation, run:

cargo doc --open

Or visit docs.rs/truston.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

Licensed:

Acknowledgments

Resources

Dependencies

~9–25MB
~306K SLoC