#bit-vector #benchmark #elias-fano #succinct #sequence

app cseq_benchmark

The program for benchmarking compact sequences and bitmaps

4 releases

0.1.2 Oct 2, 2024
0.1.1 Feb 25, 2024
0.1.0 Feb 25, 2024
0.0.1 Dec 26, 2023

#70 in Compression

MIT/Apache

255KB
3.5K SLoC

cseq_benchmark is the program (by Piotr Beling) for benchmarking compact sequences and bitmaps.

It can test the listed algorithms contained in the following crates:

  • cseq: Elias-Fano (experimental);
  • bitm: rank and select queries on bit vectors;
  • sucds: rank and select queries on bit vectors;
  • succinct: rank and select queries on bit vectors;
  • sux: select queries on bit vectors;
  • vers (only if compiled with vers-vecs feature): rank and select queries on bit vectors.

Please run the program with the --help switch to see the available options.

Below you can find instruction for installing cseq_benchmark and reproducing experiments performed with it, which can be found in published or under review papers. Note that these instructions have been tested under GNU/Linux and may require some modifications for other systems.

Installation

cseq_benchmark can be compiled and installed from sources. To do this, a Rust compiler is needed. The easiest way to obtain the compiler along with other necessary tools (like cargo) is to use rustup.

Please follow the instructions at https://www.rust-lang.org/tools/install.

Once Rust is installed, just execute the following to install cseq_benchmark with native optimizations:

RUSTFLAGS="-C target-cpu=native" cargo install --features=vers-vecs cseq_benchmark

The --features=vers-vecs flag enables compilation of the non-portable vers crate. It should be omitted in case of compilation problems.

Reproducing experiments from the papers

Rust libraries and programs focused on succinct data structures

(Piotr Beling, BSuccinct: Rust libraries and programs focused on succinct data structures, SoftwareX, Volume 26, 2024, 101681, ISSN 2352-7110, https://doi.org/10.1016/j.softx.2024.101681)

Results for structures that support rank and select queries on bit vectors, included in libraries written in Rust (we used rustc 1.75.0 to compile), can be obtained by running:

cseq_benchmark -f -t 60 -c 20 -q 10000000 -u 1000000000 -n 500000000 bv
cseq_benchmark -f -t 60 -c 20 -q 10000000 -u 1000000000 -n 100000000 bv

Notes:

  • The -t 60 -c 20 switches force a long testing time (60s for warming up + about 60s for performing each test + 20s cooling/sleeping between tests). They can be omitted to get results faster, but averaged over fewer repetitions.
  • The -f switch causes the results to be written to files. It also can be skipped, as the results are printed to the screen anyway.

The results for the methods contained in SDSL2 (which is written in C++; we used clang 14.0.6 to compile) can be obtained using the program available at https://github.com/beling/benchmark-succinct (the page also contains compilation instructions) by running:

rank_sel 1000000000 500000000 60 10000000
rank_sel 1000000000 100000000 60 10000000

Benchmark results

Notes on benchmarks of structures supporting rank and select queries on bit vectors

  • We do not distinguish rank0 from rank1 as each is trivially computable from the other.
  • In a bit vector with adversarial distribution and n ones, 99% of them occupy the last n indices.
  • Versions of the tested crates: bitm 0.4.1, succinct 0.5.2, sucds 0.8.1, sux 0.2.0, vers 1.1.0.
  • Structures supporting select marked with * use or are integrated with (in the case of vers RsVec) the corresponding rank structure and the space overhead given for them is additional.
  • We conducted the benchmarks using: long measurement (t=60) and cooling (c=20) times, 107 random query arguments, AMD Ryzen 5600G @3.9GHz CPU and compilation with native optimizations enabled.

Rank for uniform distribution of ones in the bit vector

bit vector length1010109108
percent of ones501050105010
space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]
bitm RankSelect1011113.1233.1233.1213.1213.163.17
succinct JacobsonRank22.85222.85218.84518.84422.71222.713
succinct Rank925.01925.01925.01825.01725.0725.07
sucds Rank9Sel25.01925.01825.01725.01725.0725.07
vers RsVec4.7265.3264.7245.3244.765.36

Select1 for uniform distribution of ones in the bit vector

bit vector length1010109108
percent of ones501050105010
space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]
bitm RankSelect101111 binary search*0.03950.03960.01770.01780.0870.087
bitm RankSelect101111 combined sampling*0.4910.3920.4600.3610.4290.331
succinct JacobsonRank*0.014860.014490.09310.09310.04360.0437
succinct Rank9*0.08380.08060.05340.05160.02070.0208
sucds Rank9Sel + hints*3.11680.61443.11070.61093.1330.641
sucds Rank9Sel*0.05110.04480.02490.02520.01030.0105
sux SelectFixed112.5712.520412.5502.513812.5152.534
sux SelectFixed215.6393.19015.6333.15515.6103.119
vers RsVec*0.01510.01590.0820.01010.0320.038
* uses a corresponding structure supporting rank queries; space overhead is extra

Select0 for uniform distribution of ones in the bit vector

bit vector length1010109108
percent of ones501050105010
space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]
bitm RankSelect101111 binary search*0.04170.04160.01930.01950.0890.090
bitm RankSelect101111 combined sampling*0.41200.41220.4770.4770.4320.433
succinct JacobsonRank*0.015080.014510.09790.09750.04520.0454
succinct Rank9*0.08410.07980.05470.05310.02170.0219
sucds Rank9Sel + hints*3.11705.61653.11085.61123.1355.634
sucds Rank9Sel*0.05340.04560.02720.02730.01070.0109
sux SelectFixed112.59322.56112.56222.55012.51722.515
sux SelectFixed215.64128.13715.63428.13415.61028.112
vers RsVec*0.01380.01460.0740.0720.0300.030
* uses a corresponding structure supporting rank queries; space overhead is extra

Rank for adversarial distribution with 99% of ones near the end of the vector

bit vector length1010109108
percent of ones501050105010
space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]
bitm RankSelect1011113.1233.1233.1213.1213.173.17
succinct JacobsonRank22.85222.85218.84418.84422.71222.712
succinct Rank925.01925.01825.01725.01725.0625.07
sucds Rank9Sel25.01925.01825.01725.01725.0625.07
vers RsVec4.7265.3264.7245.3244.765.36

Select1 for adversarial distribution with 99% of ones near the end of the vector

bit vector length1010109108
percent of ones501050105010
space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]
bitm RankSelect101111 binary search*0.02960.01870.01660.0970.0820.072
bitm RankSelect101111 combined sampling*0.4820.3650.4580.3330.4280.325
succinct JacobsonRank*0.013180.010450.08190.04770.04150.0369
succinct Rank9*0.07150.05710.04770.02320.01950.0168
sucds Rank9Sel + hints*3.11590.61153.1970.6463.1260.621
sucds Rank9Sel*0.04350.02780.01950.01270.0930.073
sux SelectFixed112.5512.55012.5392.53112.5122.516
sux SelectFixed215.6373.14115.6333.13215.6103.113
vers RsVec*0.01420.0870.0770.0390.0310.030
* uses a corresponding structure supporting rank queries; space overhead is extra

Select0 for adversarial distribution with 99% of ones near the end of the vector

bit vector length1010109108
percent of ones501050105010
space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]space overhead [%]time / query [ns]
bitm RankSelect101111 binary search*0.03350.04180.01820.01920.0840.088
bitm RankSelect101111 combined sampling*0.41070.41180.4740.4760.4310.432
succinct JacobsonRank*0.013910.014270.08940.09630.04370.0449
succinct Rank9*0.07370.08080.04900.05190.02070.0216
sucds Rank9Sel + hints*3.11605.61643.1985.61103.1275.630
sucds Rank9Sel*0.04150.04500.02200.02640.0960.0105
sux SelectFixed112.55622.55412.54322.54612.51322.514
sux SelectFixed215.63928.13615.63428.13415.61028.113
vers RsVec*0.01320.01460.0700.0710.0300.031
* uses a corresponding structure supporting rank queries; space overhead is extra

Dependencies

~16–47MB
~800K SLoC