#matrix #bit #tree #collection #sparse-matrix #bit-matrix

k2_tree

A space-efficient representation of sparsely populated bit-matrices

15 releases

0.5.3 Apr 12, 2022
0.5.2 Jan 21, 2021
0.4.3 Sep 19, 2020
0.3.2 Aug 18, 2020
0.1.0 Jun 3, 2020

#1088 in Data structures

MIT license

140KB
3K SLoC

Build Status API

k2_tree

A collection designed to efficiently compress sparsely-populated bit-matrices.

See the original proposal here.

Note: This library heavily relies upon bitvec to optimally store its data, which is very slow when compiled without optimisations!

Usage

Add k2_tree into your project dependencies:

[dependencies]
k2_tree = "0.5.2"

When K2Trees are Useful:

K2Trees are extremely efficient at representing data that can be encoded as a two-dimensional bit-matrix, especially if said matrix is sparsely populated.

Take a real-world example: representing Web-Graphs.

Hyperlinks between webpages can be encoded as a 2-d bit-matrix, where each column/row corresponds to a specific page and each bit denotes whether two pages are joined via a hyperlink; 1 if yes, 0 if not.

These adjacency-matrices tend to be extremely sparse most of the time, making the K2Tree the perfect structure for encoding them!

Another example is representing Triple-Stores, which this repo demonstrates is effective.

Example:

Original Bit-Matrix:

00|00||10|10
00|00||00|11
------------
00|00||00|00
00|00||00|10
============
10|10||00|11
10|00||00|00
------------
00|00||00|00
00|00||00|00

Bit-Representation:

[0111; 1101, 1100, 0100; 1000, 1011, 0010, 1010, 1000, 1100]

(Where ; separates layers and , separates blocks)

Final K2Tree:

K2Tree {
  stem_k: 2, // usize
  leaf_k: 2, // usize
  max_slayers: 2, // usize
  stems: [0111110111000100], // BitVec
  leaves: [100010110010101010001100], // BitVec
}

For a more in-depth explanation of the compression process, check this out.

The Road to 1.0:

  • Make K2Tree work over any value of K.
  • Separate the k field into two distinct fields: stem_k, leaf_k.
  • Increase compression ratio by removing the stem_to_leaf and slayer_starts field without compromising operation complexity.
  • Implement serde's Serialize and Deserialize traits.
  • Unit test all the things.
  • Stabilise the API.

- GGabi

Dependencies

~1.3–1.9MB
~45K SLoC