#btree-set #btree-map #set #map #order-statistics

indexset

A two-level BTree with fast iteration and indexing operations

11 releases

0.3.8 Feb 17, 2024
0.3.7 Feb 17, 2024
0.3.6 Aug 12, 2023
0.3.5 Jul 17, 2023
0.1.0 Jul 4, 2023

#271 in Data structures

Download history 204/week @ 2024-02-16 46/week @ 2024-02-23 45/week @ 2024-03-01 33/week @ 2024-03-08 75/week @ 2024-03-15 24/week @ 2024-03-22 28/week @ 2024-03-29 59/week @ 2024-04-05

191 downloads per month

Apache-2.0 OR MIT

135KB
2.5K SLoC

indexset

crates.io docs

A pure-Rust two-level dynamic order-statistic b-tree.

This crate implements a compact set data structure that preserves its elements' sorted order and allows lookups of entries by value or sorted order position.

Also, it is(mostly) a drop-in replacement for the stdlib BTree.

Background

This was heavily inspired by indexmap, and python's sortedcontainers.

It differs from both in that:

  • indexmap is a hashmap that provides numerical lookups, but does not maintain order in case of removals, while indexset is a b-tree that always maintains order, irrespective of which mutating operation is run.
  • sortecontainers is similar in spirit, but utilizes a different routine for balancing the tree, and relies on a heap for numerical lookups.

indexset provides the following features:

  • As fast to iterate as a vec.
  • Zero indirection.
  • Lookups by position and range.
  • Minimal amount of allocations.
  • select(lookups by position) and rank operations in near constant time.

Performance

BTreeSet and BTreeMap derive a couple of performance facts directly from how it is constructed, which is roughly:

A two-level B-Tree with a fenwick tree as a low-cost index for numerical lookups

  • Iteration is very fast since it inherits a vec's iter struct.
  • Insertions and removals are constant time in the best-case scenario, and logarithmic on the node size in the worst case
  • Lookups are very fast, and rely only on two binary searches over very little data.

Benchmarks

run cargo bench and see it for yourself.

On a lowest-specced M1 macbook pro I got the following numbers:

Inserting 100k random usize

  • stdlib::collections::BTreeSet.insert(i): 8.25ms
  • indexset::BTreeSet.insert(i): 17.3ms

Checking that each 100k random usize integers exist

  • stdlib::collections::BTreeSet.contains(i): 7.5ms
  • indexset::BTreeSet.contains(i): 6.8ms

Getting all 100k i-th elements

  • stdlib::collections::BTreeSet.iter.nth(i): 13.28s yes, seconds!
  • indexset::BTreeSet.get_index(i): 3.93ms

Iterating over all 100k elements and then collecting it into a vec

  • Vec::from_iter(stdlib::collections::BTreeSet.iter()): 227.28us
  • Vec::from_iter(indexset::BTreeSet.iter()): 123.21.us

Yes.

Getting the i-th element is 3400x faster than stdlib's btree, contains is 10% faster, and iterating is twice as fast, at the cost of insertions being half as fast.

If your use case of std::collections::BTreeSet and BTreeMap is more read-heavy, or if you really need to index by sorted-order position, it might be worth checking out this indexset instead.

Limitations

  • BTreeMap is less polished than BTreeSet. This crate has been optimised for a leaner BTreeSet.
  • This is beta quality software, so use it at your own risk. I'm not 100% certain about every single iterator implementation(everything has tests though).
  • There's quite a couple utility traits that have not been implemented yet.

Naming

This library is called indexset, because the base data structure is BTreeSet. BTreeMap is a BTreeSet with a Pair<K, V> item type.

Changelog

See CHANGELOG.md.

Dependencies

~0.5–1MB
~24K SLoC