0.1.0	Aug 31, 2024

MIT license

145KB

Versioned Log Stream Merkle Tree

⚠️ This repository is a work in progress and is not production ready 🚧

Versioned Log Stream Merkle Tree is an authenticated key-value store optimized for large-scale use.
Origin: LETUS (See DMM-Trie plus VDLS).

Objectives

Authentication: provide commitment (state root) and inclusion proofs for key-value pairs.
Multi-version: track and provide authenticated historical states.
Performance: minimize I/O amplifications of the underlying storage.

The current design utilizes a 16-way trie as Merkle Tree.
Each leaf node records:
- $V$ the data version.
- $k$ the key.
- $l$ the location of the node in the underlying storage.
- $h = H(k, v)$ the Merkle hash of this node.
Each index node maintains 16 slots and a bitmap $b$ to indicate whether a slot is not empty.
Occupied slots record data version $V_c$, Merkle hash $h_c$, and a memory reference $r$ of its child node.
The Merkle hash of an index node is computed after concatenating the Merkle hash of its child nodes.
The root hash of the VLSM tree is recursively calculated with the child nodes.
Nodes are routed via 4-bit nibbles, with multiple nibbles spliced to form a nibble path.
Nibbles are derived from $H(k)$'s prefix to ensure balance and resistance to attacks.
The VLSM tree structure is deterministic given a set of key-value pairs, as the splitting and merging of VLSM tree nodes depend solely on the nibbles.
The root hash is, therefore, deterministic by the dataset, regardless of the order of the updates.

VLSM tree supports data multi-versioning by maintaining the data version $V$ in each node and allows queries of value data, root hash, and Merkle proof for each version.
$V$ is a comparable 64-bit integer, such as a block number or transaction number.
When data is updated, the $V$ and Merkle hash $h$ of the nodes along the vertical path from root to leaf need to be updated.
Redirect-on-write (RoW) is employed to optimize memory usage, which prevents multi-versioned data structures from copying unnecessary data.
Although historical versions are necessary for auditability and verification of data changes, they are generally infrequently accessed.
The storage engine should skew resources, such as memory, towards the latest versions to accelerate transaction processing.