2 releases
new 0.1.1 | Jan 22, 2025 |
---|---|
0.1.0 | Jan 22, 2025 |
#1 in #bitcask
45KB
599 lines
bitask
Bitask is a Rust implementation of Bitcask, a log-structured key-value store optimized for high-performance reads and writes. Here's the summary in 10 lines:
- Core Idea: Bitcask uses append-only logs for writes, ensuring atomic and durable operations.
- Indexing: Keys are stored in memory in a BTreeMap, mapping to their positions in the logs for O(1) lookups.
- Data Layout: Storage consists of active file (for writes) and sealed immutable files.
- Reads: Lookups find key positions in memory, then read values directly from log files.
- Compaction: Manual compaction merges files to reclaim space and remove obsolete entries.
- Crash Recovery: Index rebuilds on startup by scanning logs, using timestamps for conflict resolution.
- Process Safety: File-based locking ensures single-writer, multiple-reader access.
- Data Integrity: CRC32 checksums verify data correctness.
- File Management: Automatic log rotation at 4MB with timestamp-based naming.
- Design Principles: Simplicity, durability, and efficient reads/writes.
Paper
https://riak.com/assets/bitcask-intro.pdf
Project
This implementation provides:
- A Rust library crate for embedding in other projects
- Core Bitcask features including:
- Append-only log structure with automatic rotation (4MB per file)
- In-memory key directory using BTreeMap
- Process-safe file locking
- Crash recovery through log replay
- Data integrity via CRC32 checksums
- Only byte arrays (
Vec<u8>
) are supported for keys and values
Usage
use bitask::db::Bitask;
// Open database with exclusive write access
let mut db = Bitask::open("./db")?;
// Store a value
db.put(b"key".to_vec(), b"value".to_vec())?;
// Retrieve a value
let value = db.ask(b"key")?;
assert_eq!(value, b"value");
// Remove a value
db.remove(b"key".to_vec())?;
// Manual compaction
db.compact()?;
// Process safety demonstration
let another_db = Bitask::open("./db");
assert!(matches!(another_db.err().unwrap(), bitask::db::Error::WriterLock));
Implementation Details
Log Files
- Active file:
<timestamp>.active.log
- Current file being written to - Sealed files:
<timestamp>.log
- Immutable files after rotation - Lock file:
db.lock
- Ensures single-writer access
Log Rotation
- Active log files rotate automatically at 4MB
- Files are named with millisecond timestamps
- After rotation,
.active.log
becomes.log
and new.active.log
is created
Durability Guarantees
- Atomic single-key operations
- Crash recovery through log replay
- Data integrity verification via CRC32
Limitations
- All keys must fit in memory
- Single writer at a time
- No multi-key transactions
Dependencies
~1.6–2.5MB
~42K SLoC