5 releases
Uses new Rust 2024
0.1.10 | Mar 27, 2025 |
---|---|
0.1.9 | Mar 27, 2025 |
0.1.8 | Mar 27, 2025 |
0.1.6 | Mar 27, 2025 |
0.1.5 | Mar 27, 2025 |
#6 in #json-schema-validation
26 downloads per month
Used in rstructor
83KB
1.5K
SLoC
RStructor: Structured LLM Outputs for Rust
RStructor is a Rust library for extracting structured data from Large Language Models (LLMs) with built-in validation. Define your schemas as Rust structs/enums, and RStructor will handle the rest—generating JSON Schemas, communicating with LLMs, parsing responses, and validating the results.
Think of it as the Rust equivalent of Instructor + Pydantic for Python, bringing the same structured output capabilities to the Rust ecosystem.
✨ Features
- 📝 Type-Safe Definitions: Define data models as standard Rust structs/enums with attributes
- 🔄 JSON Schema Generation: Auto-generates JSON Schema from your Rust types
- ✅ Built-in Validation: Type checking plus custom business rule validation
- 🔌 Multiple LLM Providers: Support for OpenAI and Anthropic, with an extensible backend system
- 🧩 Complex Data Structures: Support for nested objects, arrays, and optional fields
- 🔍 Custom Validation Rules: Add domain-specific validation for reliable results
- 🔁 Async API: Fully asynchronous API for efficient operations
- ⚙️ Builder Pattern: Fluent API for configuring LLM clients
- 📊 Feature Flags: Optional backends via feature flags
📦 Installation
Add RStructor to your Cargo.toml
:
[dependencies]
rstructor = "0.1.0"
serde = { version = "1.0", features = ["derive"] }
tokio = { version = "1.0", features = ["rt-multi-thread", "macros"] }
🚀 Quick Start
Here's a simple example of extracting structured information about a movie from an LLM:
use rstructor::{LLMModel, LLMClient, OpenAIClient, OpenAIModel};
use serde::{Serialize, Deserialize};
use std::env;
// Define your data model
#[derive(LLMModel, Serialize, Deserialize, Debug)]
struct Movie {
#[llm(description = "Title of the movie")]
title: String,
#[llm(description = "Director of the movie")]
director: String,
#[llm(description = "Year the movie was released", example = 2010)]
year: u16,
#[llm(description = "Brief plot summary")]
plot: String,
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Get API key from environment
let api_key = env::var("OPENAI_API_KEY")?;
// Create an OpenAI client
let client = OpenAIClient::new(api_key)?
.model(OpenAIModel::Gpt35Turbo)
.temperature(0.0)
.build();
// Generate structured information with a simple prompt
let movie: Movie = client.generate_struct("Tell me about the movie Inception").await?;
// Use the structured data
println!("Title: {}", movie.title);
println!("Director: {}", movie.director);
println!("Year: {}", movie.year);
println!("Plot: {}", movie.plot);
Ok(())
}
📝 Detailed Examples
Basic Example with Validation
Add custom validation rules to enforce business logic beyond type checking:
use rstructor::{LLMModel, LLMClient, OpenAIClient, OpenAIModel, RStructorError, Result};
use serde::{Serialize, Deserialize};
#[derive(LLMModel, Serialize, Deserialize, Debug)]
#[llm(description = "Information about a movie")]
struct Movie {
#[llm(description = "Title of the movie")]
title: String,
#[llm(description = "Year the movie was released", example = 2010)]
year: u16,
#[llm(description = "IMDB rating out of 10", example = 8.5)]
rating: f32,
}
// Add custom validation
impl Movie {
fn validate(&self) -> Result<()> {
// Title can't be empty
if self.title.trim().is_empty() {
return Err(RStructorError::ValidationError(
"Movie title cannot be empty".to_string()
));
}
// Year must be in a reasonable range
if self.year < 1888 || self.year > 2030 {
return Err(RStructorError::ValidationError(
format!("Movie year must be between 1888 and 2030, got {}", self.year)
));
}
// Rating must be between 0 and 10
if self.rating < 0.0 || self.rating > 10.0 {
return Err(RStructorError::ValidationError(
format!("Rating must be between 0 and 10, got {}", self.rating)
));
}
Ok(())
}
}
Complex Nested Structures
RStructor supports complex nested data structures:
use rstructor::{LLMModel, LLMClient, OpenAIClient, OpenAIModel};
use serde::{Serialize, Deserialize};
// Define a nested data model for a recipe
#[derive(LLMModel, Serialize, Deserialize, Debug)]
struct Ingredient {
#[llm(description = "Name of the ingredient", example = "flour")]
name: String,
#[llm(description = "Amount of the ingredient", example = 2.5)]
amount: f32,
#[llm(description = "Unit of measurement", example = "cups")]
unit: String,
}
#[derive(LLMModel, Serialize, Deserialize, Debug)]
struct Step {
#[llm(description = "Order number of this step", example = 1)]
number: u16,
#[llm(description = "Description of this step",
example = "Mix the flour and sugar together")]
description: String,
}
#[derive(LLMModel, Serialize, Deserialize, Debug)]
#[llm(description = "A cooking recipe with ingredients and instructions")]
struct Recipe {
#[llm(description = "Name of the recipe", example = "Chocolate Chip Cookies")]
name: String,
#[llm(description = "List of ingredients needed")]
ingredients: Vec<Ingredient>,
#[llm(description = "Step-by-step cooking instructions")]
steps: Vec<Step>,
}
// Usage:
// let recipe: Recipe = client.generate_struct("Give me a recipe for chocolate chip cookies").await?;
Working with Enums
RStructor supports both simple enums and enums with associated data.
Simple Enums
Use enums for categorical data:
use rstructor::{LLMModel, LLMClient, AnthropicClient, AnthropicModel};
use serde::{Serialize, Deserialize};
// Define an enum for sentiment analysis
#[derive(LLMModel, Serialize, Deserialize, Debug)]
#[llm(description = "The sentiment of a text")]
enum Sentiment {
#[llm(description = "Positive or favorable sentiment")]
Positive,
#[llm(description = "Negative or unfavorable sentiment")]
Negative,
#[llm(description = "Neither clearly positive nor negative")]
Neutral,
}
#[derive(LLMModel, Serialize, Deserialize, Debug)]
struct SentimentAnalysis {
#[llm(description = "The text to analyze")]
text: String,
#[llm(description = "The detected sentiment of the text")]
sentiment: Sentiment,
#[llm(description = "Confidence score between 0.0 and 1.0",
example = 0.85)]
confidence: f32,
}
// Usage:
// let analysis: SentimentAnalysis = client.generate_struct("Analyze the sentiment of: I love this product!").await?;
Enums with Associated Data (Tagged Unions)
RStructor also supports more complex enums with associated data:
use rstructor::{LLMModel, SchemaType};
use serde::{Deserialize, Serialize};
// Enum with different types of associated data
#[derive(LLMModel, Serialize, Deserialize, Debug)]
enum UserStatus {
#[llm(description = "The user is online")]
Online,
#[llm(description = "The user is offline")]
Offline,
#[llm(description = "The user is away with an optional message")]
Away(String),
#[llm(description = "The user is busy until a specific time in minutes")]
Busy(u32),
}
// Using struct variants for more complex associated data
#[derive(LLMModel, Serialize, Deserialize, Debug)]
enum PaymentMethod {
#[llm(description = "Payment with credit card")]
Card {
#[llm(description = "Credit card number")]
number: String,
#[llm(description = "Expiration date in MM/YY format")]
expiry: String,
},
#[llm(description = "Payment via PayPal account")]
PayPal(String),
#[llm(description = "Payment will be made on delivery")]
CashOnDelivery,
}
// Usage:
// let user_status: UserStatus = client.generate_struct("What's the user's status?").await?;
When serialized to JSON, these enum variants with data become tagged unions:
// UserStatus::Away("Back in 10 minutes")
{
"Away": "Back in 10 minutes"
}
// PaymentMethod::Card { number: "4111...", expiry: "12/25" }
{
"Card": {
"number": "4111 1111 1111 1111",
"expiry": "12/25"
}
}
Configuring Different LLM Providers
Choose between different providers:
// Using OpenAI
let openai_client = OpenAIClient::new(openai_api_key)?
.model(OpenAIModel::Gpt4)
.temperature(0.2)
.max_tokens(1500)
.build();
// Using Anthropic
let anthropic_client = AnthropicClient::new(anthropic_api_key)?
.model(AnthropicModel::Claude3Sonnet)
.temperature(0.0)
.max_tokens(2000)
.build();
Handling Container-Level Attributes
Add metadata and examples at the container level:
#[derive(LLMModel, Serialize, Deserialize, Debug)]
#[llm(description = "Detailed information about a movie",
title = "MovieDetails",
examples = [
::serde_json::json!({
"title": "The Matrix",
"director": "Lana and Lilly Wachowski",
"year": 1999,
"genres": ["Sci-Fi", "Action"],
"rating": 8.7,
"plot": "A computer hacker learns from mysterious rebels about the true nature of his reality and his role in the war against its controllers."
})
])]
struct Movie {
// fields...
}
📚 API Reference
LLMModel Trait
The LLMModel
trait is the core of RStructor. It's implemented automatically via the derive macro and provides schema generation and validation:
pub trait LLMModel: SchemaType + DeserializeOwned + Serialize {
fn validate(&self) -> Result<()> {
Ok(())
}
}
Override the validate
method to add custom validation logic.
LLMClient Trait
The LLMClient
trait defines the interface for all LLM providers:
#[async_trait]
pub trait LLMClient {
async fn generate_struct<T>(&self, prompt: &str) -> Result<T>
where
T: LLMModel + DeserializeOwned + Send + 'static;
async fn generate(&self, prompt: &str) -> Result<String>;
}
Supported Attributes
Field Attributes
description
: Text description of the fieldexample
: A single example valueexamples
: Multiple example values
Container Attributes
description
: Text description of the struct or enumtitle
: Custom title for the JSON Schemaexamples
: Example instances as JSON objects
🔧 Feature Flags
Configure RStructor with feature flags:
[dependencies]
rstructor = { version = "0.1.0", features = ["openai", "anthropic"] }
Available features:
openai
: Include the OpenAI clientanthropic
: Include the Anthropic clientderive
: Include the derive macro (enabled by default)
📋 Examples
See the examples/
directory for complete, working examples:
structured_movie_info.rs
: Basic example of getting movie information with validationnested_objects_example.rs
: Working with complex nested structures for recipe datanews_article_categorizer.rs
: Using enums for categorizationenum_with_data_example.rs
: Working with enums that have associated data (tagged unions)event_planner.rs
: Interactive event planning with user inputweather_example.rs
: Simple model with validation demonstration
▶️ Running the Examples
# Set environment variables
export OPENAI_API_KEY=your_openai_key_here
# or
export ANTHROPIC_API_KEY=your_anthropic_key_here
# Run examples
cargo run --example structured_movie_info
cargo run --example news_article_categorizer
🛣️ Roadmap
- Core traits and interfaces
- OpenAI backend implementation
- Anthropic backend implementation
- Procedural macro for deriving
LLMModel
- Schema generation functionality
- Custom validation capabilities
- Support for nested structures
- Rich validation API with custom domain rules
- Support for enums with associated data (tagged unions)
- Streaming responses
- Support for additional LLM providers
- Integration with web frameworks (Axum, Actix)
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
👥 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Dependencies
~0.6–1.5MB
~33K SLoC