Simple BLAS [sd]gemm benchmark

Jul 6, 2021


[sd]gemm benchmark


This is a small [sd]gemm benchmark based, similar to ACES DGEMM, implemented in Rust. It supports the following BLAS libraries:

  • Accelerate (macOS)
  • Intel MKL
  • OpenBLAS


Build with Accelerate (macOS)

$ cargo install gemm-benchmark --features accelerate

Build with Intel MKL

To build the benchmark with Intel MKL statically linked, use:

$ cargo install gemm-benchmark --features intel-mkl

Intel MKL uses Zen-specific [sd]gemmkernels on AMD Zen CPUs. However, these kernels are slower on many Zen CPUs than the AVX2 kernels. You can build the benchmark to override Intel CPU detection, so that MKL uses AVX2 kernels on Zen CPUs as well. This does require dynamic linking, since it is not permitted to modify MKL binaries. To enable this override, use the intel-mkl-amd feature:

$ cargo install gemm-benchmark --features intel-mkl-amd

Build with OpenBLAS

$ cargo install gemm-benchmark --features openblas

Set OPENBLAS_NUM_THREADS=1 before running.


By default, sgemm is benchmarked using 256 x 256 matrices, for 1,000 iterations and 1 thread. The dimensionality (-d), number of iterations (-i), and the number of threads (-t) can be set with command-line flags. For example:

$ gemm-benchmark -d 1024 -i 2000 -t 4

Runs the benchmark using 1024 x 1024 matrices, for 1,000 iterations, and 4 threads. It is also possible to benchmark dgem, using the --dgemm option:

$ gemm-benchmark -d 1024 -i 2000 -t 4 --dgemm


