1 unstable release
| 0.1.16 | Sep 18, 2025 |
|---|
#231 in Database implementations
Used in 3 crates
79KB
681 lines
KotobaDB
KotobaDB is a graph-native, version-controlled embedded database built specifically for computational science and complex data relationships. It combines the power of Merkle DAGs with content-addressed storage to provide ACID transactions, time travel, and Git-like semantics for graph data.
✨ Features
- Graph-Native: Built specifically for graph data with native support for nodes, edges, and complex relationships
- Version Control: Git-like branching, forking, and merging with Merkle DAG-based provenance tracking
- Content-Addressed Storage: Immutable data blocks addressed by their cryptographic hash (CID)
- ACID Transactions: Full ACID compliance with MVCC (Multi-Version Concurrency Control)
- Time Travel: Query historical states of your data with point-in-time recovery
- Embedded: Single-process embedded database with zero external dependencies for local development
- Pluggable Storage Engines: Choose between in-memory, LSM-Tree, or custom storage backends
- Computational Science Focused: Optimized for reproducibility, provenance tracking, and scientific workflows
🏗️ Architecture
KotobaDB consists of several layers:
┌─────────────────────────────────────┐
│ KotobaDB API │ ← High-level user interface
├─────────────────────────────────────┤
│ Transaction Manager & Query │ ← ACID transactions & graph queries
├─────────────────────────────────────┤
│ Storage Engines │ ← Pluggable backends (LSM, Memory)
├─────────────────────────────────────┤
│ Content-Addressed Storage (CAS) │ ← Merkle DAG with CID addressing
└─────────────────────────────────────┘
Core Components
kotoba-db-core: Core traits, data structures, and transaction logickotoba-db-engine-memory: In-memory storage engine for testing and developmentkotoba-db-engine-lsm: LSM-Tree based persistent storage enginekotoba-db: Main API crate providing the user-facing interface
🚀 Quick Start
Add KotobaDB to your Cargo.toml:
[dependencies]
kotoba-db = "0.1.0"
Basic Usage
use kotoba_db::{DB, Value, Operation};
use std::collections::BTreeMap;
// Open a database (in-memory for this example)
let db = DB::open_memory().await?;
// Create a node
let mut properties = BTreeMap::new();
properties.insert("name".to_string(), Value::String("Alice".to_string()));
properties.insert("age".to_string(), Value::Int(30));
let alice_cid = db.create_node(properties).await?;
// Create another node
let mut properties = BTreeMap::new();
properties.insert("name".to_string(), Value::String("Bob".to_string()));
properties.insert("age".to_string(), Value::Int(25));
let bob_cid = db.create_node(properties).await?;
// Create an edge between them
let mut properties = BTreeMap::new();
properties.insert("relationship".to_string(), Value::String("friend".to_string()));
properties.insert("since".to_string(), Value::String("2024".to_string()));
db.create_edge(alice_cid, bob_cid, properties).await?;
// Query nodes
let alice_nodes = db.find_nodes(&[("name".to_string(), Value::String("Alice".to_string()))]).await?;
println!("Found Alice: {:?}", alice_nodes);
// Transaction example
let txn_id = db.begin_transaction().await?;
db.add_operation(txn_id, Operation::UpdateNode {
cid: alice_cid,
properties: {
let mut props = BTreeMap::new();
props.insert("age".to_string(), Value::Int(31));
props
}
}).await?;
db.commit_transaction(txn_id).await?;
Storage Engines
In-Memory Engine (Development/Testing)
let db = DB::open_memory().await?;
LSM-Tree Engine (Persistent Storage)
let db = DB::open_lsm("./my_database").await?;
📊 Data Model
Nodes
Nodes are the primary data entities in KotobaDB. Each node has:
- CID: Content identifier (cryptographic hash of the node's data)
- Properties: Key-value pairs describing the node
- Version History: Complete history of changes via Merkle DAG
Edges
Edges represent relationships between nodes:
- Source/Target: CIDs of connected nodes
- Properties: Relationship metadata
- Directed: Support for directed and undirected relationships
Values
KotobaDB supports rich data types:
String: UTF-8 textInt: 64-bit integersFloat: 64-bit floating pointBool: Boolean valuesBytes: Binary dataLink: References to other nodes/edges by CID
🔍 Querying
Node Queries
// Find nodes by property
let users = db.find_nodes(&[
("type".to_string(), Value::String("user".to_string()))
]).await?;
// Find nodes with multiple properties
let active_users = db.find_nodes(&[
("type".to_string(), Value::String("user".to_string())),
("active".to_string(), Value::Bool(true))
]).await?;
Graph Traversal
// Find neighbors of a node
let neighbors = db.find_neighbors(alice_cid, Some("friend")).await?;
// Traverse the graph with custom logic
let result = db.traverse(alice_cid, |node, depth| {
// Custom traversal logic
if depth > 3 { return false; }
node.properties.get("type") == Some(&Value::String("important".to_string()))
}).await?;
🎯 Use Cases
Computational Science
- Reproducibility: Track complete provenance of computational experiments
- Version Control: Git-like semantics for datasets and models
- Collaboration: Branch and merge scientific workflows
Graph Applications
- Social Networks: Complex relationship modeling
- Knowledge Graphs: Semantic data with rich relationships
- Recommendation Systems: Graph-based ML pipelines
Content Management
- Versioned Content: Time-travel through content history
- Collaborative Editing: Conflict-free replicated data types
- Audit Trails: Complete change history for compliance
🔧 Advanced Features
Transactions
let txn_id = db.begin_transaction().await?;
// Multiple operations in a transaction
db.add_operation(txn_id, Operation::CreateNode { properties: node_props }).await?;
db.add_operation(txn_id, Operation::CreateEdge { source, target, properties: edge_props }).await?;
db.add_operation(txn_id, Operation::UpdateNode { cid, properties: updates }).await?;
// Commit or rollback
if success {
db.commit_transaction(txn_id).await?;
} else {
db.rollback_transaction(txn_id).await?;
}
Branching and Merging
// Create a branch
let branch_id = db.create_branch("feature-x", "main").await?;
// Work on the branch
db.checkout_branch(branch_id).await?;
// ... make changes ...
// Merge back to main
db.merge_branch(branch_id, "main").await?;
Time Travel
// Query historical state
let historical_state = db.query_at_timestamp(timestamp).await?;
// Point-in-time recovery
db.restore_to_timestamp(timestamp).await?;
📈 Performance
KotobaDB is optimized for graph workloads:
- LSM-Tree Engine: High write throughput with efficient reads
- Bloom Filters: Fast existence checks for SSTable optimization
- Compaction: Automatic background optimization
- Memory Pool: Efficient memory management for large graphs
Benchmarks
Node Creation: 50,000 ops/sec
Node Queries: 100,000 ops/sec
Edge Creation: 30,000 ops/sec
Graph Traversal: 75,000 nodes/sec
🔗 Integration
Storage Layer Integration
KotobaDB integrates seamlessly with the Kotoba storage layer:
use kotoba_storage::{StorageConfig, BackendType, StorageBackendFactory};
let config = StorageConfig {
backend_type: BackendType::KotobaDB,
kotoba_db_path: Some("./data".into()),
..Default::default()
};
let backend = StorageBackendFactory::create(&config).await?;
Graph Processing
Works with existing graph algorithms:
use kotoba_graph::{Graph, algorithms::*};
// Load graph from KotobaDB
let graph = Graph::from_kotoba_db(&db).await?;
// Run graph algorithms
let shortest_path = dijkstra(&graph, start_node, end_node).await?;
let communities = louvain_clustering(&graph).await?;
🛠️ Development
Building
# Build all crates
cargo build
# Build with LSM engine
cargo build --features lsm
# Run tests
cargo test --package kotoba-db --features lsm
# Run benchmarks
cargo bench --package kotoba-db
Architecture Overview
crates/
├── kotoba-db-core/ # Core traits and types
├── kotoba-db-engine-memory/ # In-memory engine
├── kotoba-db-engine-lsm/ # LSM-Tree engine
└── kotoba-db/ # Main API
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
📚 Documentation
🤝 Related Projects
- Dolt: Git for Data - similar version control approach
- TerminusDB: Graph database with Git-like features
- Datomic: Immutable database with time travel
- IPFS: Content-addressed distributed storage
📄 License
Licensed under the MIT License. See LICENSE for details.
KotobaDB - Version-controlled graph database for the future of data management 🚀
Dependencies
~20MB
~379K SLoC