1 unstable release

0.1.0	Oct 28, 2024

#106 in Profiling

3,493 downloads per month

MIT/Apache

90KB
2K SLoC

Yet Another Benchmarking framework powered by cachegrind

Documentation:

YAB is Yet Another Benchmarking framework powered by cachegrind from the Valgrind tool suite. It collects reproducible measurements of Rust code (e.g., the number of executed instructions, number of L1 and L2/L3 cache hits and RAM accesses), making it possible to use in CI etc.

Features

Supports newer cachegrind versions and customizing the cachegrind wrapper.
Supports capturing only instruction counts (i.e., not simulating CPU caches).
Conditionally injects CACHEGRIND_{START|STOP}_INSTRUMENTATION macros (available in cachegrind 3.22.0+) allowing for more precise measurements.
Supports configurable warm-up (defined in terms of executed instructions) before the capture.

Usage

Define a benchmark binary and include it into your crate manifest:

[dev-dependencies]
yab = "0.1.0"

[[bench]]
name = "your_bench"
harness = false

In the bench source (benches/your_bench.rs in the example above), define a function with signature fn(&mut yab::Bencher) and wrap it in the yab::main! macro:

use yab::Bencher;

fn benchmarks(bencher: &mut Bencher) {
    // define your benchmarking code here
}

yab::main!(benchmarks);

Run benchmarks as usual using cargo bench (or cargo test --bench ... to test them).

Configuration options

Run cargo bench ... -- --help to get help on the supported configuration options. Some of the common options are:

--list: lists benchmarks without running them.
--print: prints results of the latest run instead of running benchmarks.
--jobs N / -j N: specifies the number of benchmarks to run in parallel. By default, it's equal to the number of logical CPUs in the system.
--verbose, --quiet: increases or decreases verbosity of benchmarking output.

Examples

use yab::{black_box, Bencher, BenchmarkId};

/// Suppose we want to benchmark this function
fn fibonacci(n: u64) -> u64 {
    match n {
        0 | 1 => 1,
        n => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

fn benchmarks(bencher: &mut Bencher) {
    // Benchmark simple functions.
    bencher
        .bench("fib_short", || fibonacci(black_box(10)))
        .bench("fib_long", || fibonacci(black_box(30)));
    // It's possible to benchmark parametric functions as well:
    for n in [15, 20, 25] {
        bencher.bench(
            BenchmarkId::new("fib", n),
            || fibonacci(black_box(n)),
        );
    }
    // To account for setup and/or teardown, you may use `bench_with_capture`
    bencher.bench_with_capture("fib_capture", |capture| {
        // This will not be included into captured stats.
        black_box(fibonacci(black_box(30)));
        // This will be the only captured segment.
        let output = capture.measure(|| fibonacci(black_box(10)));
        // This assertion won't be captured either
        assert_eq!(output, 55);
    });
}

yab::main!(benchmarks);

Here's sample benchmark output:

Basic benchmark output

More verbose output with --verbose option also showcasing changes to the benchmarked function:

Verbose benchmark output

Limitations

cachegrind has somewhat limited platform support (e.g., doesn't support Windows).
cachegrind uses simplistic / outdated CPU cache simulation to the point that recent versions disable this simulation altogether by default.
cachegrind has limited support when simulating multi-threaded environment.
Even small changes in the benchmarked code can lead to (generally small) divergences in the measured stats.

Alternatives and similar tools

This crate is heavily inspired by iai, the original cachegrind-based benchmarking framework for Rust.
iai-callgrind is an extended / reworked fork of iai. Compared to it, yab prefers simplicity to versatility.
Benchmarking APIs are inspired by criterion.

License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in yab by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Dependencies

~3–12MB
~159K SLoC