13 releases (6 breaking)

Uses new Rust 2024

0.8.5-alpha Dec 3, 2025
0.8.3-alpha Nov 27, 2025
0.7.0-alpha Nov 5, 2025
0.6.0-alpha Oct 31, 2025
0.2.0-alpha Oct 3, 2025

#134 in Database implementations

Download history 115/week @ 2025-09-29 46/week @ 2025-10-06 189/week @ 2025-10-13 508/week @ 2025-10-20 59/week @ 2025-10-27 13/week @ 2025-11-03 34/week @ 2025-11-10 23/week @ 2025-11-17 16/week @ 2025-11-24 11/week @ 2025-12-01

84 downloads per month
Used in 12 crates (8 directly)

Apache-2.0

715KB
16K SLoC

Columnar storage engine for LLKV.

This crate provides the low-level columnar layer that persists Apache Arrow RecordBatches to disk and supports efficient scans, filters, and updates. It serves as the foundation for llkv-table and higher-level query execution.

Role in the Story

The column map is where LLKV’s Arrow-first design meets pager-backed persistence. Every sqllogictest shipped with SQLite—and an expanding set of DuckDB suites—ultimately routes through these descriptors and chunk walkers. The storage layer therefore carries the burden of matching SQLite semantics while staying efficient enough for OLAP workloads. Gaps uncovered by the logic tests are treated as defects in this crate, not harness exceptions.

The engine is maintained in the open by a single developer. These docs aim to give newcomers the same context captured in the README and DeepWiki pages so the story remains accessible as the project grows.

Architecture

The storage engine is organized into several key components:

  • ColumnStore: Primary interface for storing and retrieving columnar data. Manages column descriptors, metadata catalogs, and coordinates with the pager for persistent storage.

  • ScanBuilder: Builder pattern for constructing column scans with various options (filters, ordering, row ID inclusion).

  • Visitor Pattern: Scans emit data through visitor callbacks rather than materializing entire columns in memory, enabling streaming and aggregation.

Storage Model

Data is stored in columnar chunks:

  • Each column is identified by a LogicalFieldId
  • Columns are broken into chunks for incremental writes
  • Each chunk stores Arrow-serialized data plus metadata (row count, min/max values)
  • Shadow columns track row IDs separately from user data
  • MVCC columns (created_by, deleted_by) track transaction visibility

Namespaces

Columns are organized into namespaces to prevent ID collisions:

  • UserData: Regular table columns
  • RowIdShadow: Internal row ID tracking for each column
  • TxnCreatedBy: MVCC transaction that created each row
  • TxnDeletedBy: MVCC transaction that deleted each row

Test Coverage

  • SQLite suites: The storage layer powers every SQLite sqllogictest case that upstream publishes. Passing those suites provides a baseline for SQLite compatibility, but LLKV still diverges from SQLite behavior in places and should not be treated as a drop-in replacement yet.
  • DuckDB extensions: DuckDB-focused suites exercise MVCC edge cases and typed transaction flows. Coverage is early and informs the roadmap rather than proving full DuckDB parity today. All suites run through the sqllogictest crate.

Thread Safety

ColumnStore is thread-safe (Send + Sync) with internal locking for catalog updates. Read operations can occur concurrently; writes are serialized through the catalog lock.

Macros and Type Dispatch

This crate provides macros for efficient type-specific operations without runtime dispatch overhead. See with_integer_arrow_type! for details.


LLKV Column Map

llkv-column-map implements the ColumnStore, the Arrow-native columnar storage engine for the LLKV stack. It maps logical fields to pager-managed physical chunks, enabling efficient scans, appends, and MVCC bookkeeping.

This crate is not intended for direct standalone use.

License

Licensed under the Apache-2.0 License.

Dependencies

~25MB
~429K SLoC