1 unstable release

Uses new Rust 2024

new 0.1.1-alpha.2	May 14, 2025

#378 in HTTP server

Apache-2.0

170KB
4K SLoC

toi_server

A proof-of-concept for a personal assistant server with type-safe tool search and tool usage via HTTP API endpoints.

Requirements

The server requires the following supporting services:

A Postgres database with pgvector
An OpenAI-compliant embedding API
An OpenAI-compliant chat completions API
A vLLM reranking API

The server binary also has some native dependencies, so the Docker image is the easiest way to get started.

Configuration

At least two environment variables are required for configuration:

DATABASE_URL: required by Diesel for connecting to the backing database
TOI_CONFIG_PATH: path to the server configuration file

The actual server configuration file at the path defined by TOI_CONFIG_PATH should have HTTP client options for the embedding, generation, and reranking APIs. It also supports environment variable interpolation for some values, so you can put something like this to keep API secrets safe:

{
    "server": {
        "bind_addr": "0.0.0.0:6969",
        "user_agent": "${USER_AGENT}"
    },
    "embedding": {
        "base_url": "http://embedding:8000"
    },
    "generation": {
        "base_url": "http://generation:8000",
        "headers": {
            "api_key": "${MY_API_KEY}"
        }
    },
    "reranking": {
        "base_url": "http://reranking:8000"
    }
}

If you decide to use different models from the ones provided by the project's Docker Compose file, then be sure to tune/set the embedding distance and reranking similarity threshold values referenced by the configuration struct.

Notable dependencies

axum for HTTP endpoint definitions
Diesel for type-safe database interactions
pgvector-rust for pgvector Rust support
schemars for JSON Schema generation
serde and serde_json for the serialization/deserialization stuff
tokio for async stuff
Utoipa for OpenAPI docs generation

How it works

Generally, the flow of a user's request goes as follows:

A user makes a request to the /chat endpoint
An embedding API is used for vector search to find server endpoint descriptions similar to the user's most recent message/query
The vector search results are filtered and reranked using a reranking API
If the top endpoint result matches the user's query within a threshold, its JSON Schema is used to make an HTTP request for that endpoint using a generation API
The generated HTTP request is added as an assistant message to the local context
The generated HTTP request is sent to the top endpoint
The HTTP response is added as a user message to the local context
A generation API is used to stream a summarization of the response back to the user

Motivation

In addition to wanting to learn some of the dependencies I used in this project, I've been thinking about making a self-hosted personal assistant that I could use and easily extend myself for a while now. Recently, there's been a flurry of AI tool usage articles, followed by the announcement of the Model Context Protocol (MCP), and now MCP servers are popping-up everywhere. Eventually, I couldn't resist the intrusive thought of "well, you could just build type-safe tools using plain ol' HTTP endpoints, OpenAPI schemas, and JSON Schemas".

And so that's what this is.

Dependencies

~29–44MB
~758K SLoC