#btree-set #set #btree-map #order-statistics #map

indexset

A two-level BTree with fast iteration and indexing operations

14 unstable releases (4 breaking)

0.5.0 Sep 18, 2024
0.4.1 Aug 19, 2024
0.4.0 May 24, 2024
0.3.8 Feb 17, 2024
0.1.0 Jul 4, 2023

#200 in Data structures

Download history 270/week @ 2024-08-16 286/week @ 2024-08-23 206/week @ 2024-08-30 216/week @ 2024-09-06 438/week @ 2024-09-13 469/week @ 2024-09-20 420/week @ 2024-09-27 290/week @ 2024-10-04 397/week @ 2024-10-11 345/week @ 2024-10-18 274/week @ 2024-10-25 248/week @ 2024-11-01 324/week @ 2024-11-08 333/week @ 2024-11-15 258/week @ 2024-11-22 192/week @ 2024-11-29

1,136 downloads per month
Used in materialized-view

Apache-2.0 OR MIT

190KB
4K SLoC

indexset

crates.io docs

A pure-Rust two-level dynamic order-statistic b-tree.

This crate implements a compact set data structure that preserves its elements' sorted order and allows lookups of entries by value or sorted order position.

Also, it is(mostly) a drop-in replacement for the stdlib BTree.

Under the feature concurrent you can find a persistent version of the BTree that can be fearlessly shared between threads.

Background

This was heavily inspired by indexmap, and python's sortedcontainers.

It differs from both in that:

  • indexmap is a hashmap that provides numerical lookups, but does not maintain order in case of removals, while indexset is a b-tree that always maintains order, irrespective of which mutating operation is run.
  • sortecontainers is similar in spirit, but utilizes a different routine for balancing the tree, and relies on a heap for numerical lookups.

indexset provides the following features:

  • As fast to iterate as a vec.
  • Zero indirection.
  • Lookups by position and range.
  • Minimal amount of allocations.
  • select(lookups by position) and rank operations in near constant time.

Performance

BTreeSet and BTreeMap derive a couple of performance facts directly from how it is constructed, which is roughly:

A two-level B-Tree with a fenwick tree as a low-cost index for numerical lookups

  • Iteration is very fast since it inherits a vec's iter struct.
  • Insertions and removals are constant time in the best-case scenario, and logarithmic on the node size in the worst case
  • Lookups are very fast, and rely only on two binary searches over very little data.

Benchmarks

run cargo bench and see it for yourself.

On a lowest-specced M1 macbook pro I got the following numbers:

Inserting 100k random usize

  • stdlib::collections::BTreeSet.insert(i): 8.25ms
  • indexset::BTreeSet.insert(i): 17.3ms

Checking that each 100k random usize integers exist

  • stdlib::collections::BTreeSet.contains(i): 7.5ms
  • indexset::BTreeSet.contains(i): 6.8ms

Getting all 100k i-th elements

  • stdlib::collections::BTreeSet.iter.nth(i): 13.28s yes, seconds!
  • indexset::BTreeSet.get_index(i): 3.93ms

Iterating over all 100k elements and then collecting it into a vec

  • Vec::from_iter(stdlib::collections::BTreeSet.iter()): 227.28us
  • Vec::from_iter(indexset::BTreeSet.iter()): 123.21.us

Yes.

Getting the i-th element is 3400x faster than stdlib's btree, contains is 10% faster, and iterating is twice as fast, at the cost of insertions being half as fast.

If your use case of std::collections::BTreeSet and BTreeMap is more read-heavy, or if you really need to index by sorted-order position, it might be worth checking out this indexset instead.

Limitations

  • BTreeMap is less polished than BTreeSet. This crate has been optimised for a leaner BTreeSet.
  • This is beta quality software, so use it at your own risk. I'm not 100% certain about every single iterator implementation(everything has tests though).
  • There's quite a couple utility traits that have not been implemented yet.
  • Concurrent BTreeMap and BtreeSet are lock-free and wait-free. That doesn't mean it is fast however :)

Naming

This library is called indexset, because the base data structure is BTreeSet. BTreeMap is a BTreeSet with a Pair<K, V> item type.

Changelog

See CHANGELOG.md.

Dependencies

~0.4–1MB
~23K SLoC