1 unstable release

Uses new Rust 2024

new 0.1.1-alpha.2 May 14, 2025

#378 in HTTP server

Apache-2.0

170KB
4K SLoC

toi_server

A proof-of-concept for a personal assistant server with type-safe tool search and tool usage via HTTP API endpoints.

Requirements

The server requires the following supporting services:

The server binary also has some native dependencies, so the Docker image is the easiest way to get started.

Configuration

At least two environment variables are required for configuration:

  • DATABASE_URL: required by Diesel for connecting to the backing database
  • TOI_CONFIG_PATH: path to the server configuration file

The actual server configuration file at the path defined by TOI_CONFIG_PATH should have HTTP client options for the embedding, generation, and reranking APIs. It also supports environment variable interpolation for some values, so you can put something like this to keep API secrets safe:

{
    "server": {
        "bind_addr": "0.0.0.0:6969",
        "user_agent": "${USER_AGENT}"
    },
    "embedding": {
        "base_url": "http://embedding:8000"
    },
    "generation": {
        "base_url": "http://generation:8000",
        "headers": {
            "api_key": "${MY_API_KEY}"
        }
    },
    "reranking": {
        "base_url": "http://reranking:8000"
    }
}

If you decide to use different models from the ones provided by the project's Docker Compose file, then be sure to tune/set the embedding distance and reranking similarity threshold values referenced by the configuration struct.

Notable dependencies

How it works

Generally, the flow of a user's request goes as follows:

  • A user makes a request to the /chat endpoint
  • An embedding API is used for vector search to find server endpoint descriptions similar to the user's most recent message/query
  • The vector search results are filtered and reranked using a reranking API
  • If the top endpoint result matches the user's query within a threshold, its JSON Schema is used to make an HTTP request for that endpoint using a generation API
  • The generated HTTP request is added as an assistant message to the local context
  • The generated HTTP request is sent to the top endpoint
  • The HTTP response is added as a user message to the local context
  • A generation API is used to stream a summarization of the response back to the user

Motivation

In addition to wanting to learn some of the dependencies I used in this project, I've been thinking about making a self-hosted personal assistant that I could use and easily extend myself for a while now. Recently, there's been a flurry of AI tool usage articles, followed by the announcement of the Model Context Protocol (MCP), and now MCP servers are popping-up everywhere. Eventually, I couldn't resist the intrusive thought of "well, you could just build type-safe tools using plain ol' HTTP endpoints, OpenAPI schemas, and JSON Schemas".

And so that's what this is.

Related artifacts

Dependencies

~29–44MB
~758K SLoC