#set #sparse #unsigned-integer #array #sparse-set

thinset

A data structure for sparse sets of unsigned integers that sacrifices space for speed

4 releases (breaking)

0.4.0 Jan 29, 2024
0.3.0 Jan 15, 2024
0.2.0 Jan 2, 2024
0.1.0 Dec 31, 2023

#377 in Data structures

MIT/Apache

48KB
903 lines

thinset

A data structure optimized for sparse sets of unsigned integers.

crates.io Documentation Rust CI rustc 1.0+

Dependency Status Download Status

Usage

Add this to your Cargo.toml:

[dependencies]
thinset = "0.1"

Description

An implementation of a set and a map using a pair of sparse and dense arrays as backing stores, based on the paper "An efficient representation for sparse sets" (1993) by Briggs and Torczon.

This type of set is useful when you need to efficiently track set membership for integers from a large universe, but the values are relatively spread apart.

Compared to the standard library's HashSet, clearing a set is O(1) instead of O(n). Compared to a bitmap-based set, iteration over the set is proportional to the cardinality of the set (how many elements you have) instead of proportional to the maximum size of the set.

The main downside is that the set uses more memory than other set implementations.

The map behaves identically to the set with the exception that it tracks data alongside the values that are stored in the set. Under the hood, SparseSet is a SparseMap of keys to ().

The table below compares the asymptotic complexities of several set operations for the sparse set when compared a bit set. n is the number of elements in the set and u is the size of the set's universe.

Operation Sparse Set Bit Set
Insertion O(1) O(1)
Removal O(1) O(1)
Lookup O(1) O(1)
Clear O(1) O(u)
Count O(1) O(u)
Iteration O(n) O(u)
Union O(n) O(u)
Intersection O(n) O(u)
Difference O(n) O(u)
Complement O(n) O(u)

Benchmarks

The following benchmarks were run on a 2020 MacBook Pro with a 2 GHz Intel Core i5 processor.

The benchmark compares SparseSet to the standard library's HashSet and the bit-set crate's BitSet.

When inserting 1000 random elements into the set from a universe of [0, 2^16) and then iterating over the set, the sparse set is 4.1x faster than the HashSet and 1.7x faster than the BitSet:

  • SparseSet: 160,329 ns/iter (+/- 55,664)
  • BitSet: 278,428 ns/iter (+/- 42,477)
  • HashSet: 662,964 ns/iter (+/- 56,851)

Benchmarks are available in examples/bench.rs and can be run with the following command:

cargo run --example bench

Examples

use thinset::SparseSet;

let mut s: SparseSet<usize> = SparseSet::new();
s.insert(0);
s.insert(3);
s.insert(7);

s.remove(7);

if !s.contains(7) {
    println!("There is no 7");
}

// Print 0, 1, 3 in some order
for x in s.iter() {
    println!("{}", x);
}
use thinset::{Pair, SparseMap};

let mut m: SparseMap<u32, u32> = SparseMap::new();

m.insert(13, 2);
m.insert(8, 16);

assert_eq!(m.get(13), Some(&2));
assert_eq!(m.get(6), None);

for Pair {key, value} in m.iter() {
    println!("{key}:{value}");
}

License

Dual-licensed for compatibility with the Rust project.

Licensed under the Apache License Version 2.0: http://www.apache.org/licenses/LICENSE-2.0, or the MIT license: http://opensource.org/licenses/MIT, at your option.

Dependencies

~155KB