3 releases
0.1.0alpha.3  Sep 8, 2024 

0.1.0alpha.2  Aug 29, 2024 
#272 in Data structures
275KB
5.5K
SLoC
HIerarchical BITmap TREE
Hierarchical bitmap tree is an integerkey fixeddepth prefixtree
with no memory overhead[^mem_overhead].
That have unique[^unique_ops], blazingly fast intercontainer intersection[^unparalleled_intersection] and union.
That outperforms HashMap<u32, T>
[^hashmap_conf] most of the time.
Think of it as a map that can do set things. And MUCH more efficiently[^intersection_efficiency] then traditional set operations combined with map lookups.

Always stable O(1) random access.

Predictable insert/remove performance  no treebalancing, or any hidden performance impact.

Ordered by key[^sorting]. Have unordered contiguous iteration[^unordered_iter] as well.

Fast intercontainer equality and ordering [Not yet implemented].
[^intersection_efficiency]: Intersection operation directly over data container is much faster, than intersecting set + getting items from tree/map. Since with intersection directly over tree  we are skipping additional tree traverse phase for actually getting data.
[^mem_overhead]: Tree nodes store childpointers in a dense format  nullpointers are not stored. While still have O(1) child access, like in traditional sparse format.
[^hashmap_conf]: HashMap<u32, T> with nohashhasher and uniform key distribution  ideal condition for HashMap.
[^unparalleled_intersection]: Since the very tree is a form of hierarchical bitmap it naturally acts as intersection acceleration structure.
[^unique_ops]: Usually tree/map data structures does not provide such operations.
[^sorting]: Which means, that it can be used for sort acceleration.
[^unordered_iter]: Unordered iteration is as fast as iterating Vec.
Examples
Sparse vector dot product
examples/readme_sparse_vec_dot.rs
type SparseVec = DenseTree<f32, 4>;
let mut v1: SparseVec = Default::default();
v1.insert(10, 1.0);
v1.insert(20, 10.0);
v1.insert(30, 100.0);
let mut v2: SparseVec = Default::default();
v2.insert(10, 1.0);
v2.insert(30, 0.5);
let mul = intersection(&v1, &v2) // lazy elementwise mul
.map((e1, e2): (&f32, &f32) e1 * e2);
let dot: f32 = mul.iter().map((_index, element) element).sum();
assert_eq!(dot, 51.0);
Multitype intersection
examples/readme_multi_type_example.rs
// index as userid.
type Tree<T> = DenseTree<T, 4>;
let mut ages : Tree<usize> = Default::default();
ages.insert(100, 20);
ages.insert(200, 30);
ages.insert(300, 40);
let mut names: Tree<String> = Default::default();
names.insert(200, "John".into());
names.insert(234, "Zak".into());
names.insert(300, "Ernie".into());
let users = intersection(&ages, &names)
.map((i, s): (&usize, &String) format!("{s} age: {i}"));
assert_equal(users.iter(), [
(200, String::from("John age: 30")),
(300, String::from("Ernie age: 40")),
]);
Multitree intersection with inplace filter[^inplace_filter]:
examples/readme_store_example.rs
/// [store_id; good_amount]
type Goods = DenseTree<usize, 4>;
let mut apples : Goods = Default::default();
apples.insert(0, 12);
apples.insert(3, 40);
let mut oranges: Goods = Default::default();
oranges.insert(0, 4);
oranges.insert(1, 15);
oranges.insert(3, 40);
let mut carrots: Goods = Default::default();
carrots.insert(1, 5);
carrots.insert(3, 100);
// We want 5 apples, 20 oranges, 7 carrots  from the SAME store.
let goods = [&apples, &oranges, &carrots];
let min_goods_amount = [5 , 20 , 7 ];
let intersection = multi_intersection(goods.iter().copied());
let mut iter = intersection.iter();
while let Some((store_id, goods_amount /*: impl Iterator<usize> */)) =
LendingIterator::next(&mut iter)
{
// `goods_amount` iterator has the same order as `goods`.
let contains =
iter::zip(goods_amount.clone(), min_goods_amount.iter())
.find((amount, min) min <= amount )
.is_some();
if !contains{ continue }
println!("found at store {store_id} : {:?}", goods_amount.collect::<Vec<_>>());
break;
}
[^inplace_filter]: hibit_tree acquire intersected data as we go. This is much faster than doing set intersection, and then getting data by keys. Since there is no additional tree traverse.
Use cases
Sparsevector to sparsevector multiplication
For dot product we need to elementwise multiply two vectors, then sum it's elements.
Multiply zero equals zero. Sparse vector mainly filled with zeroes. In result of elementwise multiplication only elements that have BOTH sources nonnull will be nonnull.
Let's represent hierarchical bitmap tree as a sparse vector. We store only nonzero elements into tree. Intersection of two such trees will return pairs of nonzero elements. By mapping pairs to multiplication operation  we get elementwise multiplied sparse vector. By summing it's all elements  we get dot product.
Compressed bitset
hi_sparse_bitset basically use the same structure as
hibit_tree
. Can be considered as special case of hibit_tree
.
How it works
Hierarchical bitmap tree is a form of a prefix tree. Each node have fixed number of children. Level count is fixed. This allows to fast calculate innode indices for byindex access, without touching nodes, spending ~1 cycle per level.
Thou we have fixed number of children in node  we store only nonnull children, using "bitblock map" technique for access.
Node
┌─────────────┐
│ 64bit mask │ < Mask of existent children. Act as sparse index array.
Level0 │ cap │
│ [*Node;cap] │ < Dense array as FAM.
└─────────────┘
...
Node
┌─────────────┐
│ 64bit mask │
LevelN │ cap │
│ [usize;cap] │ < Data indices.
└─────────────┘
┌ ┐
Data Vec │ T │
└ ┘
Node is like a C99 object with flexible array member (FAM). Which means, that memory allocated only for existent children pointers.
Node bitmask have raised bits at children indices. Children are always stored ordered by their indices.
"bitblock map" technique
Maps sparse index in bitblock to some data in dense array.
0 1 2 3 ◁═ popcnt before bit (dense_array index)
bit_block 0 1 1 0 0 0 1 1 0 0 ...
└───┬─┬───────┬─┬─────────┘
1 2 6 7 ◁═ bit index (sparse index)
┌──┘┌┘ ┌─────┘ │
│ │ │ ┌────┘
▼ ▼ ▼ ▼
dense_array 1, 32, 4, 5 len = bit_block popcnt
1 => 1 (dense_array[0])
2 => 32 (dense_array[1])
6 => 4 (dense_array[2])
7 => 5 (dense_array[3])
Dense array elements must always have the same order as sparse array indices.
On x86 getting dense index costs 34 tacts with bmi1
:
(bit_block & !(u64::MAX << index)).count_ones()
and just 2 with bmi2
:
_bzhi_u64(bit_block, index).count_ones()
Switching to uncompressed node
[Unimplemented]
Since children must always remain sorted, insert and remove technically O(N). That's not a problem for most nodes, since they will have just a few children. But for dense nodes  it is reasonably to switch to "uncompressed" data storage  where insert and remove O(1).
To do this without introducing branching, we add another bitmask. If children count less then some threshold (32) it is equal to the original one. Otherwise  filled with ones. We will use it for bitblock mapping: below threshold it will work as usual, above  it's denseindex will equal sparseindex (because bits before requested sparse index are all ones).
Hierarchical bitmap
Bitmasks in nodes form hierarchical bitset/bitmap. Which is similar to hi_sparse_bitset, which in turn was derived from hibitset. This means, that all operations from hierarchical bitsets are possible, with the same performance characteristics.
Intersection
The bread and butter of hibittree is superfast intersection. Intersected (or you may say  common) part of the tree computed on the fly, by simply ANDing nodes bitmasks. Then resulted bitmask can be traversed. Nodes with indexes of 1 bits used to dive deeper, until the terminal node. Hence, tree branches that does not have common keys discarded early. The more trees you intersect at once  the greater the technique benefit.
Alas, it is possible that only at terminal node we get empty bitmask. But even in this case it is faster then other solutions, since we check multiple intersection possibilities at once. In worst case, it degenerates to case of intersecting bitvecs without empty blocks. Which is quite fast by itself.
Laziness
All intercontainer operations return lazy trees, which can be used further in intertree operations. Lazy trees can be materialized to concrete container.
TODO: EXAMPLE
Design choices
Tree have compiletime defined depth and width. This performs significantly better, then traditional dynamic structure. And since we have integer as key, we need only 811 levels depth max  any memoverhead is neglectable due to tiny node sizes. But we assume, that most of the time user will work near u32 index space (46 levels).
Dynamic tree
It is possible to have a dynamicdepth hibittree (with singlechild nodes skipped from hierarchy). But on shallow trees (~4 levels/32bit range)  fixeddepth tree outperforms significantly ALWAYS. And we expect these shallow 32bit range trees to be used the most.
Plus, for dynamic tree to be beneficial, tree must be very sparsely populated across index range. Since as soon as there will be no singlechilded nodes  dynamic tree will have the same amount of nodes in depth as a fixed one.
Uncompressed tree
Lib also provide version of tree with uncompressed nodes (with fixed sized array of children). It have higher memory overhead, but a little bit faster since it don't need to do bitblock mapping. It still have much lower memory footprint, then using plain Vec for sparse data, while being faster then HashMap.
You may need it, if your target platform have slow bit manipulation operations. Or memory overhead is not an issue  and you want intertree intersection as fast as possible.
It also can use SIMDsized (and accelerated) bitblocks in nodes.
Dependencies
~63–285KB