#collection #tree #matrix #bit #bit-matrix

k2_tree

A space-efficient representation of sparsely populated bit-matrices

14 releases (4 breaking)

0.5.2 Jan 21, 2021
0.5.1 Jan 16, 2021
0.4.3 Sep 19, 2020
0.3.2 Aug 18, 2020
0.1.0 Jun 3, 2020

#461 in Data structures

Download history 43/week @ 2021-03-31 16/week @ 2021-04-07 23/week @ 2021-04-14 35/week @ 2021-04-21 14/week @ 2021-04-28 1/week @ 2021-05-05 14/week @ 2021-05-12 1/week @ 2021-05-19 14/week @ 2021-06-09 14/week @ 2021-07-14

117 downloads per month

MIT license

135KB
3K SLoC

Build Status API

k2_tree

A collection designed to efficiently compress sparsely-populated bit-matrices.

See the original proposal here.

Note: This library heavily relies upon bitvec to optimally store its data, which is very slow when compiled without optimisations!

Usage

Add k2_tree into your project dependencies:

[dependencies]
k2_tree = "0.5.2"

When K2Trees are Useful:

K2Trees are extremely efficient at representing data that can be encoded as a two-dimensional bit-matrix, especially if said matrix is sparsely populated.

Take a real-world example: representing Web-Graphs.

Hyperlinks between webpages can be encoded as a 2-d bit-matrix, where each column/row corresponds to a specific page and each bit denotes whether two pages are joined via a hyperlink; 1 if yes, 0 if not.

These adjacency-matrices tend to be extremely sparse most of the time, making the K2Tree the perfect structure for encoding them!

Another example is representing Triple-Stores, which this repo demonstrates is effective.

Example:

Original Bit-Matrix:

00|00||10|10
00|00||00|11
------------
00|00||00|00
00|00||00|10
============
10|10||00|11
10|00||00|00
------------
00|00||00|00
00|00||00|00

Bit-Representation:

[0111; 1101, 1100, 0100; 1000, 1011, 0010, 1010, 1000, 1100]

(Where ; separates layers and , separates blocks)

Final K2Tree:

K2Tree {
  stem_k: 2, // usize
  leaf_k: 2, // usize
  max_slayers: 2, // usize
  stems: [0111110111000100], // BitVec
  leaves: [100010110010101010001100], // BitVec
}

For a more in-depth explanation of the compression process, check this out.

The Road to 1.0:

  • Make K2Tree work over any value of K.
  • Separate the k field into two distinct fields: stem_k, leaf_k.
  • Increase compression ratio by removing the stem_to_leaf and slayer_starts field without compromising operation complexity.
  • Implement serde's Serialize and Deserialize traits.
  • Unit test all the things.
  • Stabilise the API.

- GGabi

Dependencies

~1.6–2.2MB
~46K SLoC