34 releases (15 breaking)

0.17.1 Jan 9, 2022
0.16.0 Jul 15, 2021
0.9.3 Mar 26, 2021
0.3.0 Sep 15, 2020

#1625 in Data structures

28 downloads per month
Used in banyan-utils

MIT/Apache

165KB
4K SLoC

Banyan trees

Banyan trees in nature are the trees with the widest canopy. They also sprout new roots, which after some time can become indistinguishable from the original root.

The banyan trees in this library are persistent trees with a high branching factor (wide canopy) that can be used to efficiently store and query large sequences of key value pairs.

Banyan trees are optimized for appending new entries and filtered, streaming reading of values. They naturally support addressing elements by index. They do not support arbitrary insertion or deletion of elements, so they are not a general purpose map data structure.

They are not balanced trees, but they share some properties with B-Trees.

Persistence

Banyan trees are persistent, using a content-addressed storage system such as ipfs or a key value store. Data is CBOR encoded and zstd compressed for space efficient persistent storage and replication. It is also encrypted using the chacha20 stream cipher.

Indexing

Each banyan tree entry consists of a key part and a value part. The value part is an opaque value that can be serialized as cbor. This can not be used for filtering in queries. The key part can be used for efficient querying by defining how to compute summaries from keys. These summaries will be stored higher up in the tree to allow for efficient querying and filtered streaming.

In addition to querying by key content, elements can always be efficiently accessed by offset.

Sealing and packing

Sealing

A leaf node of the tree is considered sealed as soon as the compressed data exceeds a certain size. This depends on configuration and compression rate. A branch node is considered sealed as soon as it has a certain, configurable number of child nodes.

Packing

Banyan trees are not balanced, since they do not have to support arbitrary insertion and deletion. There is a concept somewhat related to balancing called packing. A tree is considered packed if all but the last leaf node is packed, and branch nodes are packed to the left as much as possible. This is usually the most space efficient representation of the sequence of key value pairs, and also the most efficient one for subsequent appending.

Unpacked trees

Packing a tree after each append of a single entry or a small number of entries would be inefficient, since it would involve recreating a new branch node for each level and a leaf node.

To allow for quick append, it is possible to add elements without packing. This creates a linked list of trees, and in the case of adding individual elements, a degenerate tree resembling a linked list.

Unpacked trees can be packed at any time without changing the content.

Dependencies

~9–17MB
~216K SLoC