### 5 releases

0.0.6 | Nov 21, 2022 |
---|---|

0.0.5 | Oct 6, 2020 |

0.0.2 | May 17, 2020 |

#**144** in Data structures

**71** downloads per month

Used in **3** crates
(2 directly)

**MIT/Apache**

58KB

1K
SLoC

# RsDict: Fast rank/select over bitmaps

Rank and select are two useful operations on bitmaps for building more sophisticated data
structures. First, the *rank* at a given index

counts the number of set bits left of `i`

. For
example, a sparse array can be represented as a dense array of the values present and a bitmap
indicating which indices are present. Then, rank provides a function from an index into the sparse
array to an index into the dense one.`i`

*Select* is the inverse of rank, where

returns the index of the `select``(`B`,` k`)`

th set bit. To make
the two inverses, we use zero-indexing for select (so `k`

returns the index of the first
bit set in `select``(`B`,` `0``)`

) and rank only counts indices strictly to the left of `B`

. From our previous example,
`i`

allows going from an index in the dense array to the original sparse array.`select`

This data structure implements these two operations efficiently on top of an append-only bitmap. It's an implementation of Navarro and Providel, "Fast, Small, Simple Rank/Select On Bitmaps", with heavy inspiration from a Go implementation. The underlying bitmap is stored in compressed form, so long runs of zeros and ones do not take up much space. The indices for rank and select add about 28% overhead over the uncompressed bitmap.

For more examples on how to use rank and select to build succinct datastructures, see Navarro's book on Compact Data Structures for an overview.

## Implementation notes

This library is mostly a port of the Go implementation with a few additional optimizations.

### SSE acceleration for rank

With the nightly-only

feature and a CPU with SSSE3 support, the final step of rank is computed
in a few steps without any loops. Turning this feature on improves the `simd`

benchmark by
about 40% on my computer. See `rsdict ::`rank

`rank_acceleration``.`rs

for more details.### Optimized routines for rank and select within a `u64`

`u64`With a CPU that supports

, computing rank within a small block of 64 bits will use this
instruction to efficiently count the number of bits set. Select uses an adapted version of an an
algorithm from Daniel Lemire's
blog that uses `popcnt`

to
quickly skip over runs of trailing zeros.`tzcnt`

### Compact binomial coefficient lookup table

Encoding and decoding blocks of the compressed bitmap requires computing the binomial coefficient

where `B (n, k)`

`0` `<=` k `<=` n `<=` `64`

. Computing this on-the-fly is too expensive, so we store a
precomputed lookup table of the coefficients. However, we exploit the symmetry of `B`

in `k`

to
store only half the values. See `build``.`rs

for more details.## Performance

Here's some results from running the benchmark on my 2018 MacBook Pro with

.`-`C target`-`cpu`=`native

`rsdict``::`rank time`:` `[``10.``330` us `10.``488` us `10.``678` us`]`
Found `4` outliers among `100` measurements `(``4.``00``%``)`
`4` `(``4.``00``%``)` high mild
`jacobson``::`rank time`:` `[``17.``958` us `18.``335` us `18.``740` us`]`
Found `6` outliers among `100` measurements `(``6.``00``%``)`
`1` `(``1.``00``%``)` high mild
`5` `(``5.``00``%``)` high severe
`rank9``::`rank time`:` `[``6.``8907` us `7.``0768` us `7.``2940` us`]`
Found `1` outliers among `100` measurements `(``1.``00``%``)`
`1` `(``1.``00``%``)` high severe
`rsdict``::`select0 time`:` `[``37.``124` us `37.``505` us `37.``991` us`]`
Found `3` outliers among `100` measurements `(``3.``00``%``)`
`3` `(``3.``00``%``)` high severe
`rsdict``::`select1 time`:` `[``29.``782` us `29.``918` us `30.``067` us`]`
Found `7` outliers among `100` measurements `(``7.``00``%``)`
`5` `(``5.``00``%``)` high mild
`2` `(``2.``00``%``)` high severe
`rank9``::``binsearch``::`select0
time`:` `[``229.``64` us `231.``54` us `233.``87` us`]`
Found `5` outliers among `100` measurements `(``5.``00``%``)`
`2` `(``2.``00``%``)` high mild
`3` `(``3.``00``%``)` high severe
`rank9``::``binsearch``::`select1
time`:` `[``253.``69` us `255.``84` us `258.``19` us`]`
Found `9` outliers among `100` measurements `(``9.``00``%``)`
`4` `(``4.``00``%``)` high mild
`5` `(``5.``00``%``)` high severe

So for rank queries, this implementation is faster than

's Jacobson and slightly slower
than its Rank9. However for select queries, it's `succinct-rs`*much* faster than doing binary search over these
rank structures, so consider using this library if select is an important operation for your algorithm.

## Testing

We use QuickCheck for testing data structure invariants. In addition, there's basic AFL fuzz integration
to find interesting test cases using program coverage. Install cargo-afl
and run the

binary with the `rsdict_fuzz`

feature set.`fuzz`

`$`` cargo install afl`
`$`` cargo afl build`` --`release` --`test rsdict_fuzz` --`features fuzz
`#`` Create some starting bitsets within target/fuzz/in and create an empty directory target/fuzz/out.
$ cargo afl fuzz -i target/fuzz/in -o target/fuzz/out target/release/rsdict_fuzz
`

#### Dependencies

~0–1.5MB

~36K SLoC