15 releases
0.5.3 | Apr 12, 2022 |
---|---|
0.5.2 | Jan 21, 2021 |
0.4.3 | Sep 19, 2020 |
0.3.2 | Aug 18, 2020 |
0.1.0 | Jun 3, 2020 |
#1447 in Data structures
140KB
3K
SLoC
k2_tree
A collection designed to efficiently compress sparsely-populated bit-matrices.
See the original proposal here.
Note: This library heavily relies upon bitvec to optimally store its data, which is very slow when compiled without optimisations!
Usage
Add k2_tree
into your project dependencies:
[dependencies]
k2_tree = "0.5.2"
When K2Tree
s are Useful:
K2Tree
s are extremely efficient at representing data that can be encoded as a two-dimensional bit-matrix, especially if said matrix is sparsely populated.
Take a real-world example: representing Web-Graphs.
Hyperlinks between webpages can be encoded as a 2-d bit-matrix, where each column/row corresponds to a specific page and each bit denotes whether two pages are joined via a hyperlink; 1 if yes, 0 if not.
These adjacency-matrices tend to be extremely sparse most of the time, making the K2Tree
the perfect structure for encoding them!
Another example is representing Triple-Stores, which this repo demonstrates is effective.
Example:
Original Bit-Matrix:
00|00||10|10
00|00||00|11
------------
00|00||00|00
00|00||00|10
============
10|10||00|11
10|00||00|00
------------
00|00||00|00
00|00||00|00
Bit-Representation:
[0111; 1101, 1100, 0100; 1000, 1011, 0010, 1010, 1000, 1100]
(Where ;
separates layers and ,
separates blocks)
Final K2Tree
:
K2Tree {
stem_k: 2, // usize
leaf_k: 2, // usize
max_slayers: 2, // usize
stems: [0111110111000100], // BitVec
leaves: [100010110010101010001100], // BitVec
}
For a more in-depth explanation of the compression process, check this out.
The Road to 1.0:
- Make
K2Tree
work over any value of K. - Separate the
k
field into two distinct fields:stem_k
,leaf_k
. - Increase compression ratio by removing the
stem_to_leaf
andslayer_starts
field without compromising operation complexity. - Implement serde's
Serialize
andDeserialize
traits. - Unit test all the things.
- Stabilise the API.
- GGabi
Dependencies
~1.2–1.8MB
~43K SLoC