1 unstable release
Uses new Rust 2024
new 0.1.1-alpha.2 | May 14, 2025 |
---|
#378 in HTTP server
170KB
4K
SLoC
toi_server
A proof-of-concept for a personal assistant server with type-safe tool search and tool usage via HTTP API endpoints.
Requirements
The server requires the following supporting services:
- A Postgres database with pgvector
- An OpenAI-compliant embedding API
- An OpenAI-compliant chat completions API
- A vLLM reranking API
The server binary also has some native dependencies, so the Docker image is the easiest way to get started.
Configuration
At least two environment variables are required for configuration:
DATABASE_URL
: required by Diesel for connecting to the backing databaseTOI_CONFIG_PATH
: path to the server configuration file
The actual server configuration file at the path defined by TOI_CONFIG_PATH
should have HTTP client options for the embedding, generation, and
reranking APIs. It also supports environment variable interpolation for some
values, so you can put something like this to keep API secrets safe:
{
"server": {
"bind_addr": "0.0.0.0:6969",
"user_agent": "${USER_AGENT}"
},
"embedding": {
"base_url": "http://embedding:8000"
},
"generation": {
"base_url": "http://generation:8000",
"headers": {
"api_key": "${MY_API_KEY}"
}
},
"reranking": {
"base_url": "http://reranking:8000"
}
}
If you decide to use different models from the ones provided by the project's Docker Compose file, then be sure to tune/set the embedding distance and reranking similarity threshold values referenced by the configuration struct.
Notable dependencies
- axum for HTTP endpoint definitions
- Diesel for type-safe database interactions
- pgvector-rust for pgvector Rust support
- schemars for JSON Schema generation
- serde and serde_json for the serialization/deserialization stuff
- tokio for async stuff
- Utoipa for OpenAPI docs generation
How it works
Generally, the flow of a user's request goes as follows:
- A user makes a request to the
/chat
endpoint - An embedding API is used for vector search to find server endpoint descriptions similar to the user's most recent message/query
- The vector search results are filtered and reranked using a reranking API
- If the top endpoint result matches the user's query within a threshold, its JSON Schema is used to make an HTTP request for that endpoint using a generation API
- The generated HTTP request is added as an assistant message to the local context
- The generated HTTP request is sent to the top endpoint
- The HTTP response is added as a user message to the local context
- A generation API is used to stream a summarization of the response back to the user
Motivation
In addition to wanting to learn some of the dependencies I used in this project, I've been thinking about making a self-hosted personal assistant that I could use and easily extend myself for a while now. Recently, there's been a flurry of AI tool usage articles, followed by the announcement of the Model Context Protocol (MCP), and now MCP servers are popping-up everywhere. Eventually, I couldn't resist the intrusive thought of "well, you could just build type-safe tools using plain ol' HTTP endpoints, OpenAPI schemas, and JSON Schemas".
And so that's what this is.
Related artifacts
Dependencies
~29–44MB
~758K SLoC