2 releases
Uses new Rust 2024
0.1.1 | May 26, 2025 |
---|---|
0.1.0 | May 26, 2025 |
#615 in Database interfaces
43 downloads per month
73KB
868 lines
MemVDB - In-Memory Vector Database
MemVDB is a fast, lightweight in-memory vector database written in Rust. It provides efficient similarity search capabilities with support for multiple distance metrics, making it ideal for machine learning applications, recommendation systems, and semantic search.
🚀 Features
- Multiple Distance Metrics: Euclidean, Cosine, and Dot Product similarity
- High Performance: Optimized similarity search with binary heap algorithms
- Flexible Metadata: Store arbitrary metadata with each embedding
- Batch Operations: Efficient batch insertion and updates
- Thread Safety: Safe concurrent access with proper locking
- Zero Dependencies: Minimal external dependencies for core functionality
- Memory Efficient: Optimized data structures for large-scale operations
📦 Installation
Add MemVDB to your Cargo.toml
:
[dependencies]
memvdb = "0.1.0"
🎯 Quick Start
use memvdb::{CacheDB, Distance, Embedding};
use std::collections::HashMap;
// Create a new in-memory vector database
let mut db = CacheDB::new();
// Create a collection with 128-dimensional vectors using cosine similarity
db.create_collection("documents".to_string(), 128, Distance::Cosine).unwrap();
// Create an embedding with metadata
let mut id = HashMap::new();
id.insert("doc_id".to_string(), "doc_001".to_string());
let mut metadata = HashMap::new();
metadata.insert("title".to_string(), "Sample Document".to_string());
metadata.insert("category".to_string(), "AI".to_string());
let vector = vec![0.1; 128]; // 128-dimensional vector
let embedding = Embedding {
id,
vector,
metadata: Some(metadata),
};
// Insert the embedding
db.insert_into_collection("documents", embedding).unwrap();
// Perform similarity search
let query_vector = vec![0.2; 128];
let collection = db.get_collection("documents").unwrap();
let results = collection.get_similarity(&query_vector, 5);
println!("Found {} similar documents", results.len());
📚 Core Concepts
Collections
Collections are containers for embeddings with a specific dimensionality and distance metric. All embeddings within a collection must have the same vector dimension.
// Create collections with different distance metrics
db.create_collection("images".to_string(), 512, Distance::Euclidean).unwrap();
db.create_collection("text".to_string(), 384, Distance::Cosine).unwrap();
db.create_collection("recommendations".to_string(), 256, Distance::DotProduct).unwrap();
Embeddings
Embeddings consist of a unique identifier, vector data, and optional metadata:
let mut id = HashMap::new();
id.insert("user_id".to_string(), "user_123".to_string());
id.insert("item_id".to_string(), "item_456".to_string());
let mut metadata = HashMap::new();
metadata.insert("category".to_string(), "electronics".to_string());
metadata.insert("price".to_string(), "299.99".to_string());
let embedding = Embedding {
id,
vector: vec![0.1, 0.2, 0.3, 0.4],
metadata: Some(metadata),
};
Distance Metrics
Choose the appropriate distance metric based on your data and use case:
Euclidean Distance
- Best for: Spatial data, computer vision features
- Characteristics: Sensitive to magnitude, measures geometric distance
- Range: [0, ∞)
db.create_collection("spatial_data".to_string(), 3, Distance::Euclidean).unwrap();
Cosine Similarity
- Best for: Text embeddings, high-dimensional sparse data
- Characteristics: Ignores magnitude, measures angle between vectors
- Range: [0, 2] (converted from [-1, 1])
db.create_collection("documents".to_string(), 768, Distance::Cosine).unwrap();
Dot Product
- Best for: Pre-normalized vectors, neural network outputs
- Characteristics: Considers both angle and magnitude
- Range: (-∞, ∞)
db.create_collection("embeddings".to_string(), 512, Distance::DotProduct).unwrap();
🔧 API Reference
Database Operations
// Create a new database
let mut db = CacheDB::new();
// Create a collection
db.create_collection("my_collection".to_string(), 128, Distance::Cosine)?;
// Get a collection
let collection = db.get_collection("my_collection");
// Delete a collection
db.delete_collection("my_collection")?;
// Get all embeddings from a collection
let embeddings = db.get_embeddings("my_collection");
Embedding Operations
// Insert a single embedding
db.insert_into_collection("my_collection", embedding)?;
// Batch insert/update embeddings
let embeddings = vec![embedding1, embedding2, embedding3];
db.update_collection("my_collection", embeddings)?;
// Similarity search
let collection = db.get_collection("my_collection").unwrap();
let results = collection.get_similarity(&query_vector, 10);
Similarity Search
let results = collection.get_similarity(&query_vector, k);
for result in results {
println!("Score: {}", result.score);
println!("ID: {:?}", result.embedding.id);
println!("Metadata: {:?}", result.embedding.metadata);
}
📖 Examples
Document Similarity Search
use memvdb::{CacheDB, Distance, Embedding};
use std::collections::HashMap;
let mut db = CacheDB::new();
db.create_collection("articles".to_string(), 384, Distance::Cosine)?;
// Insert documents
let articles = vec![
("AI in Healthcare", vec![0.8, 0.7, 0.1, 0.2]),
("Machine Learning Basics", vec![0.7, 0.8, 0.2, 0.1]),
("Cooking Recipes", vec![0.1, 0.2, 0.9, 0.8]),
];
for (title, mut vector) in articles {
vector.resize(384, 0.0); // Pad to collection dimension
let mut id = HashMap::new();
id.insert("title".to_string(), title.to_string());
let mut metadata = HashMap::new();
metadata.insert("category".to_string(),
if title.contains("AI") || title.contains("Machine") {
"Technology".to_string()
} else {
"Lifestyle".to_string()
});
let embedding = Embedding {
id,
vector,
metadata: Some(metadata),
};
db.insert_into_collection("articles", embedding)?;
}
// Search for AI-related content
let mut query = vec![0.75, 0.75, 0.1, 0.1];
query.resize(384, 0.0);
let collection = db.get_collection("articles").unwrap();
let results = collection.get_similarity(&query, 3);
for (i, result) in results.iter().enumerate() {
println!("{}. {} (Score: {:.4})",
i + 1,
result.embedding.id.get("title").unwrap(),
result.score);
}
Batch Operations
// Generate many embeddings
let embeddings: Vec<Embedding> = (0..1000).map(|i| {
let mut id = HashMap::new();
id.insert("id".to_string(), i.to_string());
Embedding {
id,
vector: vec![i as f32 / 1000.0; 128],
metadata: None,
}
}).collect();
// Batch insert for better performance
db.update_collection("large_collection", embeddings)?;
E-commerce Recommendations
let mut db = CacheDB::new();
db.create_collection("products".to_string(), 256, Distance::DotProduct)?;
// Product embeddings with metadata
let products = vec![
("laptop_001", "Gaming Laptop", "Electronics", 1299.99),
("book_001", "Rust Programming", "Books", 39.99),
("headphones_001", "Wireless Headphones", "Electronics", 199.99),
];
for (product_id, name, category, price) in products {
let mut id = HashMap::new();
id.insert("product_id".to_string(), product_id.to_string());
let mut metadata = HashMap::new();
metadata.insert("name".to_string(), name.to_string());
metadata.insert("category".to_string(), category.to_string());
metadata.insert("price".to_string(), price.to_string());
// Simulate product embedding based on features
let vector = generate_product_embedding(name, category);
let embedding = Embedding {
id,
vector,
metadata: Some(metadata),
};
db.insert_into_collection("products", embedding)?;
}
// Find similar products
let user_preference_vector = vec![0.5; 256];
let collection = db.get_collection("products").unwrap();
let recommendations = collection.get_similarity(&user_preference_vector, 5);
🔧 Building and Testing
Build
cargo build --release
Run Tests
# Run all tests
cargo test
# Run specific test categories
cargo test --lib # Unit tests
cargo test --test integration_tests # Integration tests
cargo test --test similarity_tests # Similarity function tests
cargo test --test error_tests # Error handling tests
cargo test --test performance_tests # Performance tests
Run Examples
# Basic usage example
cargo run --example basic_usage
# Document similarity search
cargo run --example document_similarity
# Distance metrics comparison
cargo run --example distance_metrics
# Batch operations
cargo run --example batch_operations
🛣️ Use Cases
Machine Learning
- Feature similarity search: Find similar data points in feature space
- Nearest neighbor algorithms: Implement k-NN classifiers
- Clustering validation: Evaluate cluster quality with distance metrics
Natural Language Processing
- Semantic search: Find semantically similar documents or sentences
- Document retrieval: Information retrieval systems
- Question answering: Find relevant context for questions
Computer Vision
- Image similarity: Find visually similar images
- Face recognition: Match face embeddings
- Object detection: Compare object feature vectors
Recommendation Systems
- Content-based filtering: Recommend similar items
- User similarity: Find users with similar preferences
- Hybrid recommendations: Combine multiple embedding types
Information Retrieval
- Search engines: Semantic document search
- Knowledge bases: Find related concepts
- Data deduplication: Identify similar records
🤝 Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
Development Setup
- Clone the repository
- Install Rust (1.70+ recommended)
- Run tests:
cargo test
- Run examples:
cargo run --example basic_usage
Code Style
We use standard Rust formatting:
cargo fmt
cargo clippy
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Inspired by (memvectordb)https://github.com/KevKibe/memvectordb
- Built with Rust's performance and safety in mind
- Optimized for machine learning and AI applications
📞 Support
- Documentation: docs.rs/memvdb
- Issues: GitHub Issues
- Discussions: GitHub Discussions
MemVDB - Fast, lightweight, and efficient in-memory vector search for Rust applications.
Dependencies
~1.9–3MB
~61K SLoC