14 unstable releases (4 breaking)
0.5.0 | Sep 18, 2024 |
---|---|
0.4.1 | Aug 19, 2024 |
0.4.0 | May 24, 2024 |
0.3.8 | Feb 17, 2024 |
0.1.0 | Jul 4, 2023 |
#200 in Data structures
1,136 downloads per month
Used in materialized-view
190KB
4K
SLoC
indexset
A pure-Rust two-level dynamic order-statistic b-tree.
This crate implements a compact set data structure that preserves its elements' sorted order and allows lookups of entries by value or sorted order position.
Also, it is(mostly) a drop-in replacement for the stdlib BTree.
Under the feature concurrent
you can find a persistent
version of the BTree that can be fearlessly shared between
threads.
Background
This was heavily inspired by indexmap
, and
python's sortedcontainers
.
It differs from both in that:
indexmap
is a hashmap that provides numerical lookups, but does not maintain order in case of removals, whileindexset
is a b-tree that always maintains order, irrespective of which mutating operation is run.sortecontainers
is similar in spirit, but utilizes a different routine for balancing the tree, and relies on a heap for numerical lookups.
indexset
provides the following features:
- As fast to iterate as a vec.
- Zero indirection.
- Lookups by position and range.
- Minimal amount of allocations.
select
(lookups by position) andrank
operations in near constant time.
Performance
BTreeSet
and BTreeMap
derive a couple of performance facts directly from how it is constructed, which is roughly:
A two-level B-Tree with a fenwick tree as a low-cost index for numerical lookups
- Iteration is very fast since it inherits a vec's iter struct.
- Insertions and removals are constant time in the best-case scenario, and logarithmic on the node size in the worst case
- Lookups are very fast, and rely only on two binary searches over very little data.
Benchmarks
run cargo bench
and see it for yourself.
On a lowest-specced M1 macbook pro I got the following numbers:
Inserting 100k random usize
stdlib::collections::BTreeSet.insert(i)
: 8.25msindexset::BTreeSet.insert(i)
: 17.3ms
Checking that each 100k random usize integers exist
stdlib::collections::BTreeSet.contains(i)
: 7.5msindexset::BTreeSet.contains(i)
: 6.8ms
Getting all 100k i-th elements
stdlib::collections::BTreeSet.iter.nth(i)
: 13.28s yes, seconds!indexset::BTreeSet.get_index(i)
: 3.93ms
Iterating over all 100k elements and then collecting it into a vec
Vec::from_iter(stdlib::collections::BTreeSet.iter())
: 227.28usVec::from_iter(indexset::BTreeSet.iter())
: 123.21.us
Yes.
Getting the i-th element is 3400x faster than stdlib's btree, contains
is 10% faster, and iterating
is twice as fast, at the cost of insertions being half as fast.
If your use case of std::collections::BTreeSet
and BTreeMap
is more read-heavy, or if you really need to index by
sorted-order position, it might be worth checking out this indexset
instead.
Limitations
BTreeMap
is less polished thanBTreeSet
. This crate has been optimised for a leanerBTreeSet
.- This is beta quality software, so use it at your own risk. I'm not 100% certain about every single iterator implementation(everything has tests though).
- There's quite a couple utility traits that have not been implemented yet.
Concurrent
BTreeMap
andBtreeSet
are lock-free and wait-free. That doesn't mean it is fast however :)
Naming
This library is called indexset
, because the base data structure is BTreeSet
. BTreeMap
is a BTreeSet
with
a Pair<K, V>
item type.
Changelog
See CHANGELOG.md.
Dependencies
~0.4–1MB
~23K SLoC