1 unstable release

0.1.0 Aug 31, 2024

#1835 in Data structures

MIT license

145KB

Versioned Log Stream Merkle Tree

⚠️ This repository is a work in progress and is not production ready 🚧

  • Versioned Log Stream Merkle Tree is an authenticated key-value store optimized for large-scale use.
  • Origin: LETUS (See DMM-Trie plus VDLS).

Objectives

  1. Authentication: provide commitment (state root) and inclusion proofs for key-value pairs.
  2. Multi-version: track and provide authenticated historical states.
  3. Performance: minimize I/O amplifications of the underlying storage.

Architecture

Tree Structure

  • The current design utilizes a 16-way trie as Merkle Tree.
  • Each leaf node records:
    • $V$ the data version.
    • $k$ the key.
    • $l$ the location of the node in the underlying storage.
    • $h = H(k, v)$ the Merkle hash of this node.
  • Each index node maintains 16 slots and a bitmap $b$ to indicate whether a slot is not empty.
  • Occupied slots record data version $V_c$, Merkle hash $h_c$, and a memory reference $r$ of its child node.
  • The Merkle hash of an index node is computed after concatenating the Merkle hash of its child nodes.
  • The root hash of the VLSM tree is recursively calculated with the child nodes.
  • Nodes are routed via 4-bit nibbles, with multiple nibbles spliced to form a nibble path.
  • Nibbles are derived from $H(k)$'s prefix to ensure balance and resistance to attacks.
  • The VLSM tree structure is deterministic given a set of key-value pairs, as the splitting and merging of VLSM tree nodes depend solely on the nibbles.
  • The root hash is, therefore, deterministic by the dataset, regardless of the order of the updates.

Multi-Versioning

  • VLSM tree supports data multi-versioning by maintaining the data version $V$ in each node and allows queries of value data, root hash, and Merkle proof for each version.
  • $V$ is a comparable 64-bit integer, such as a block number or transaction number.
  • When data is updated, the $V$ and Merkle hash $h$ of the nodes along the vertical path from root to leaf need to be updated.
  • Redirect-on-write (RoW) is employed to optimize memory usage, which prevents multi-versioned data structures from copying unnecessary data.
  • Although historical versions are necessary for auditability and verification of data changes, they are generally infrequently accessed.
  • The storage engine should skew resources, such as memory, towards the latest versions to accelerate transaction processing.

No runtime deps