2 releases
| 0.1.0-alpha.3 | Oct 12, 2025 |
|---|---|
| 0.1.0-alpha.2 | Oct 4, 2025 |
| 0.1.0-alpha.1 |
|
#2525 in Algorithms
416 downloads per month
Used in 4 crates
6.5MB
144K
SLoC
OxiRS Vec - Vector Search Engine
Status: Alpha Release (v0.1.0-alpha.3) - Released October 12, 2025
⚠️ Alpha Software: This is an early alpha release. Experimental features. APIs may change without notice. Not recommended for production use.
High-performance vector search infrastructure for semantic similarity search in RDF knowledge graphs.
Features
Vector Indexing
- HNSW Index - Hierarchical Navigable Small World graphs for fast approximate nearest neighbor search
- Flat Index - Exact search for smaller datasets
- IVF Index - Inverted file index for large-scale datasets
- Dynamic Updates - Real-time index updates without full rebuilds
Search Capabilities
- Similarity Search - Find semantically similar entities
- Filtered Search - Combine vector similarity with RDF constraints
- Batch Operations - Efficient bulk indexing and search
- Multiple Distance Metrics - Cosine, Euclidean, Manhattan, Dot product
Integration
- SPARQL Extension - Vector search functions in SPARQL queries
- GraphQL Support - Vector similarity in GraphQL queries
- Embedding Models - Integration with various embedding providers
- Storage Backends - Persistent vector indices
Installation
Add to your Cargo.toml:
# Experimental feature
[dependencies]
oxirs-vec = "0.1.0-alpha.3"
Quick Start
Basic Vector Search
use oxirs_vec::{VectorStore, IndexType, DistanceMetric};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create vector store with HNSW index
let mut store = VectorStore::builder()
.index_type(IndexType::HNSW)
.dimension(768) // Embedding dimension
.distance_metric(DistanceMetric::Cosine)
.build()?;
// Add vectors
store.add_vector("entity1", &embedding1)?;
store.add_vector("entity2", &embedding2)?;
// Build index
store.build_index()?;
// Search for similar vectors
let results = store.search(&query_vector, 10, 0.8)?;
for result in results {
println!("ID: {}, Score: {}", result.id, result.score);
}
Ok(())
}
SPARQL Integration
use oxirs_vec::sparql::VectorFunctions;
let sparql = r#"
PREFIX vec: <http://oxirs.org/vec/>
SELECT ?entity ?score WHERE {
?entity a foaf:Person .
# Vector similarity search
?entity vec:similarTo "machine learning researcher" .
?entity vec:similarity ?score .
FILTER (?score > 0.8)
}
ORDER BY DESC(?score)
LIMIT 10
"#;
Architecture
Index Types
HNSW (Hierarchical Navigable Small World)
- Use Case: General purpose, balanced performance
- Search Time: O(log N)
- Build Time: O(N log N)
- Memory: Moderate
Flat Index
- Use Case: Small datasets, exact search required
- Search Time: O(N)
- Build Time: O(N)
- Memory: Low
IVF (Inverted File)
- Use Case: Large datasets, acceptable approximate results
- Search Time: O(√N)
- Build Time: O(N)
- Memory: Moderate
Distance Metrics
pub enum DistanceMetric {
Cosine, // For normalized embeddings
Euclidean, // For absolute distances
Manhattan, // For high-dimensional spaces
DotProduct, // For similarity scores
}
Advanced Features
Filtered Search
Combine vector similarity with RDF constraints:
use oxirs_vec::FilteredSearch;
let filters = FilteredSearch::builder()
.add_constraint("rdf:type", "foaf:Person")
.add_constraint("foaf:age", |age: i32| age > 18)
.build();
let results = store.filtered_search(&query_vector, filters, 10)?;
Batch Operations
Efficient bulk indexing:
let batch = vec![
("entity1", embedding1),
("entity2", embedding2),
("entity3", embedding3),
];
store.add_batch(batch)?;
store.build_index()?;
Incremental Updates
// Add without full rebuild
store.add_incremental("new_entity", &embedding)?;
// Periodic optimization
store.optimize_index()?;
Performance
Benchmarks (on sample datasets)
| Dataset Size | Index Type | Build Time | Query Time (10-NN) |
|---|---|---|---|
| 10K vectors | HNSW | 2.5s | 0.5ms |
| 100K vectors | HNSW | 28s | 1.2ms |
| 1M vectors | HNSW | 320s | 2.8ms |
| 10K vectors | Flat | 0.1s | 12ms |
| 100K vectors | IVF | 15s | 3.5ms |
Benchmarked on M1 Mac with 768-dimensional vectors
Configuration
let config = VectorStoreConfig {
index_type: IndexType::HNSW,
dimension: 768,
distance_metric: DistanceMetric::Cosine,
// HNSW-specific parameters
hnsw_m: 16, // Number of connections per node
hnsw_ef_construction: 200, // Construction time accuracy
hnsw_ef_search: 100, // Search time accuracy
// Storage options
persist_path: Some("./vector_index".into()),
cache_size: 1000,
};
Integration Examples
With oxirs-embed
use oxirs_embed::EmbeddingModel;
use oxirs_vec::VectorStore;
// Generate embeddings
let model = EmbeddingModel::load("sentence-transformers/all-mpnet-base-v2")?;
let embedding = model.encode("Machine learning research")?;
// Index and search
let mut store = VectorStore::new(IndexType::HNSW, 768)?;
store.add_vector("doc1", &embedding)?;
With oxirs-core (RDF)
use oxirs_core::Dataset;
use oxirs_vec::RdfVectorIndex;
let dataset = Dataset::from_file("knowledge_graph.ttl")?;
let mut index = RdfVectorIndex::new(&dataset)?;
// Index entities by their descriptions
for entity in dataset.subjects() {
if let Some(description) = dataset.get_description(&entity) {
let embedding = model.encode(&description)?;
index.add_entity(&entity, &embedding)?;
}
}
Status
Alpha Release (v0.1.0-alpha.3)
- ✅ HNSW/IVF/Flat indices with persisted dataset support
- ✅ SPARQL/GraphQL integration enhanced with federation-aware vector filters
- ✅ CLI pipelines for batch embedding import/export and monitoring
- ✅ SciRS2 metrics for query latency, recall, and index health
- 🚧 GPU acceleration (targeted for beta)
- 🚧 Distributed indexing (planned for v0.2.0)
Contributing
This is an experimental module. Feedback and contributions are welcome!
License
MIT OR Apache-2.0
See Also
- oxirs-embed - Embedding generation
- oxirs-arq - SPARQL query engine
- oxirs-core - RDF data model
Dependencies
~115–160MB
~2.5M SLoC