1 unstable release
| 0.1.0 | Oct 22, 2025 |
|---|
#847 in Web programming
79KB
1.5K
SLoC
ContRAG Rust - ICP Edition
Bring Retrieval-Augmented Generation (RAG) to Web3 with ICP Canisters
ContRAG Rust is a powerful library for building RAG systems directly on the Internet Computer Protocol (ICP). Store entity data on-chain, generate embeddings via HTTP outcalls, and build intelligent context for AI applicationsβall within your canister.
π Key Features
- On-Chain RAG: Build RAG systems entirely on ICP with stable memory storage
- External AI Integration: Use OpenAI, Gemini, or custom embedders via HTTP outcalls
- Flexible Data Sources: Read from canister state, stable memory, or inter-canister calls
- Web3-Native: Designed specifically for blockchain data and Web3 applications
- Zero Database Dependencies: No PostgreSQL, MongoDB, or external vector DBs required
- Developer-Friendly: Simple configuration with
.envfor secrets, JSON for schemas
π¦ Installation
Add to your canister's Cargo.toml:
[dependencies]
contrag-core = { git = "https://github.com/dhaniverse/contrag", branch = "main" }
ic-cdk = "0.13"
ic-cdk-macros = "0.13"
candid = "0.10"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
π― Quick Start
1. Define Your Entities
use contrag_core::prelude::*;
use candid::{CandidType, Deserialize};
use serde::Serialize;
#[derive(Clone, Debug, CandidType, Serialize, Deserialize)]
pub struct User {
pub id: String,
pub name: String,
pub email: String,
pub order_ids: Vec<String>,
}
impl RagEntity for User {
fn entity_type() -> &'static str {
"User"
}
fn entity_id(&self) -> String {
self.id.clone()
}
fn to_context_map(&self) -> Vec<(String, String)> {
vec![
("id".to_string(), self.id.clone()),
("name".to_string(), self.name.clone()),
("email".to_string(), self.email.clone()),
]
}
fn relationships(&self) -> Vec<EntityRelationship> {
self.order_ids
.iter()
.map(|id| EntityRelationship {
field_name: "orders".to_string(),
target_entity_type: "Order".to_string(),
target_id: id.clone(),
relationship_type: RelationshipType::OneToMany,
})
.collect()
}
}
2. Create Configuration
Create contrag.config.json:
{
"entities": [
{
"name": "User",
"canister_id": "your-canister-id",
"fetch_method": "get_user",
"relationships": [],
"auto_include": true
}
],
"embedder": {
"provider": "openai",
"model": "text-embedding-3-small",
"dimensions": 1536
},
"chunking": {
"chunk_size": 1000,
"overlap": 100,
"include_field_names": true
},
"vector_store": {
"storage_type": "stable_memory",
"enable_cache": true
}
}
Create .env (DO NOT commit):
OPENAI_API_KEY=sk-your-key-here
3. Build RAG Context
use ic_cdk_macros::*;
use contrag_core::prelude::*;
use contrag_core::embedders::openai::OpenAIEmbedder;
use contrag_core::vector_store::stable_memory_store::StableMemoryVectorStore;
#[update]
async fn build_rag_context(user_id: String, api_key: String) -> Result<String, String> {
// Load configuration
let config_json = include_str!("../contrag.config.json");
let config = contrag_core::config::load_config_from_json(config_json)
.map_err(|e| e.to_string())?;
// Get user data
let user = get_user(user_id.clone())
.ok_or_else(|| "User not found".to_string())?;
// Build context
let context_builder = ContextBuilder::new(config.chunking.clone());
let context = context_builder.build_entity_context(&user);
let chunks = context_builder.chunk_text(&context);
// Generate embeddings
let embedder = OpenAIEmbedder::new(api_key, config.embedder.model.clone());
let texts: Vec<String> = chunks.iter().map(|c| c.text.clone()).collect();
let embeddings = embedder.embed(texts.clone())
.await
.map_err(|e| e.to_string())?;
// Store vectors
let mut vector_store = StableMemoryVectorStore::new();
let namespace = format!("User:{}", user_id);
for (idx, (chunk, embedding)) in chunks.iter().zip(embeddings).enumerate() {
let vector = Vector {
id: format!("{}::chunk_{}", user_id, idx),
embedding,
text: chunk.text.clone(),
metadata: VectorMetadata {
entity_type: "User".to_string(),
entity_id: user_id.clone(),
chunk_index: idx,
total_chunks: chunks.len(),
timestamp: ic_cdk::api::time(),
custom: None,
},
};
vector_store.store(&namespace, vector).await.map_err(|e| e.to_string())?;
}
Ok(format!("Built context with {} chunks", chunks.len()))
}
4. Search Context
#[update]
async fn search_context(
user_id: String,
query: String,
api_key: String,
) -> Result<Vec<SearchResult>, String> {
let config_json = include_str!("../contrag.config.json");
let config = contrag_core::config::load_config_from_json(config_json)
.map_err(|e| e.to_string())?;
// Generate query embedding
let embedder = OpenAIEmbedder::new(api_key, config.embedder.model.clone());
let query_embeddings = embedder.embed(vec![query])
.await
.map_err(|e| e.to_string())?;
let query_embedding = query_embeddings
.into_iter()
.next()
.ok_or_else(|| "No embedding generated".to_string())?;
// Search vector store
let vector_store = StableMemoryVectorStore::new();
let namespace = format!("User:{}", user_id);
vector_store.search(&namespace, query_embedding, 5)
.await
.map_err(|e| e.to_string())
}
ποΈ Architecture
βββββββββββββββββββββββββββββββββββββββ
β Your ICP Canister (Rust) β
β β
β ββββββββββββββββββββββββββββββββ β
β β contrag-core Library β β
β β β β
β β β’ RagEntity Trait β β
β β β’ ContextBuilder β β
β β β’ StableMemoryVectorStore β β
β β β’ Entity Relationships β β
β ββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββ
β β
β β
βββββββββββββββββ ββββββββββββββββββββββ
β HTTP Outcalls β β Stable Memory β
β β β β
β β’ OpenAI β β β’ On-chain vectors β
β β’ Gemini β β β’ Persistent state β
β β’ Custom APIs β β β’ Upgrade-safe β
βββββββββββββββββ ββββββββββββββββββββββ
π¨ Core Concepts
RagEntity Trait
Implement this trait on your canister data structures to enable RAG:
pub trait RagEntity {
fn entity_type() -> &'static str; // e.g., "User", "Order"
fn entity_id(&self) -> String; // Unique ID
fn to_context_map(&self) -> Vec<(String, String)>; // Flatten to key-value
fn relationships(&self) -> Vec<EntityRelationship>; // Related entities
}
Context Building
let builder = ContextBuilder::new(config.chunking);
// Single entity
let context = builder.build_entity_context(&user);
// Entity with relationships
let related = vec![order1_context, order2_context];
let graph_context = builder.build_graph_context(&user, related);
// Chunk long text
let chunks = builder.chunk_text(&graph_context);
Vector Storage
let mut store = StableMemoryVectorStore::new();
// Store vector
store.store(namespace, vector).await?;
// Search
let results = store.search(namespace, query_embedding, k).await?;
// Manage
store.count(namespace).await?;
store.delete_namespace(namespace).await?;
π§ Configuration
Entity Configuration
{
"name": "User",
"canister_id": "rrkah-fqaaa-aaaaa-aaaaq-cai",
"fetch_method": "get_user",
"fetch_many_method": "list_users",
"relationships": [
{
"field_name": "order_ids",
"target_entity": "Order",
"relationship_type": "one_to_many"
}
],
"auto_include": true
}
Embedder Configuration
OpenAI:
{
"provider": "openai",
"model": "text-embedding-3-small",
"dimensions": 1536
}
Gemini:
{
"provider": "gemini",
"model": "embedding-001",
"dimensions": 768
}
Chunking Configuration
{
"chunk_size": 1000, // Characters per chunk
"overlap": 100, // Overlap between chunks
"include_field_names": true // Include "field: value" format
}
π Advanced Features
Cached Embeddings
use contrag_core::embedders::{CachedEmbedder, Embedder};
let embedder = OpenAIEmbedder::new(api_key, model);
let mut cached = CachedEmbedder::new(embedder, 1000); // Cache 1000 entries
let embeddings = cached.embed_with_cache(texts).await?;
Inter-Canister Data Sources
use contrag_core::data_sources::canister_state::CanisterStateSource;
let source = CanisterStateSource::new(entity_configs);
let user = source.read_entity::<User>("User", "user_123").await?;
Custom Similarity Metrics
use contrag_core::vector_store::{cosine_similarity, euclidean_distance};
let similarity = cosine_similarity(&embedding1, &embedding2);
let distance = euclidean_distance(&embedding1, &embedding2);
π Comparison: TypeScript vs Rust
| Feature | TypeScript ContRAG | Rust ContRAG (ICP) |
|---|---|---|
| Data Sources | PostgreSQL, MongoDB | Canister state, Stable memory |
| Vector Storage | Weaviate, pgvector | On-chain (stable memory) |
| Embedders | OpenAI, Gemini | OpenAI, Gemini (HTTP outcalls) |
| Configuration | Config file + .env | Config file + .env |
| Schema Introspection | β Automatic | β Manual (trait impl) |
| Runtime | Node.js | WASM (ICP canister) |
| Storage Cost | External DB fees | ~$5/GB/year on-chain |
| Query Speed | Fast (indexed DB) | Fast (in-memory) |
| Web3 Native | β | β |
π― Use Cases
- DeFi Personalization: Build user profiles from transaction history
- NFT Recommendations: Suggest NFTs based on user preferences and activity
- DAO Governance: Context-aware proposal summaries
- Web3 Social: Personalized content feeds from on-chain data
- Gaming: Player profiles and achievement context
- Identity Systems: Rich user context from verifiable credentials
π§ Limitations & Roadmap
Current Limitations
- β No automatic schema introspection (requires manual
RagEntityimpl) - β Simple cosine similarity (no advanced indexing like HNSW)
- β Single-canister vector storage (no distributed sharding yet)
- β οΈ HTTP outcall costs (cycles consumed per API call)
Roadmap
- Derive macro for
RagEntity(auto-implementation) - HNSW indexing for faster similarity search
- Multi-canister vector sharding
- IPFS/Arweave vector storage adapter
- Preference tracking (port from TypeScript)
- Cross-chain data sources (Ethereum, Bitcoin)
- Built-in embedding caching in stable memory
π Examples
See /examples/user-canister for a complete working example with:
- User and Order entities
- Relationship mapping
- Context building
- Vector storage and search
- Demo data seeding
π€ Contributing
Contributions welcome! This is an experimental project bringing RAG to Web3.
π License
MIT License - see LICENSE file
π Acknowledgments
- Based on ContRAG TypeScript
- Built for the Internet Computer Protocol
- Inspired by the vision of AI-powered Web3 applications
Ready to unlock RAG in Web3? π
Start building intelligent, context-aware canisters today!
Dependencies
~3β15MB
~133K SLoC