#profiling

dhat

A library for heap profiling and ad hoc profiling with DHAT

7 releases

0.2.4 Nov 20, 2021
0.2.3 Nov 15, 2021
0.2.2 Jan 13, 2021
0.2.1 Dec 13, 2020
0.1.1 Dec 8, 2020

#17 in Profiling

Download history 1478/week @ 2021-08-09 1610/week @ 2021-08-16 1531/week @ 2021-08-23 769/week @ 2021-08-30 906/week @ 2021-09-06 2213/week @ 2021-09-13 2077/week @ 2021-09-20 2163/week @ 2021-09-27 1777/week @ 2021-10-04 1738/week @ 2021-10-11 1836/week @ 2021-10-18 2256/week @ 2021-10-25 1990/week @ 2021-11-01 2795/week @ 2021-11-08 2836/week @ 2021-11-15 1614/week @ 2021-11-22

8,228 downloads per month
Used in 2 crates

MIT/Apache

57KB
912 lines

dhat-rs

This crate provides heap profiling and ad hoc profiling capabilities to Rust programs, similar to those provided by DHAT.

See the crate documentation for details on how to use it.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.


lib.rs:

This crate provides heap profiling and ad hoc profiling capabilities to Rust programs, similar to those provided by DHAT.

The heap profiling works by using a global allocator that wraps the system allocator, tracks all heap allocations, and on program exit writes data to file so it can be viewed with DHAT's viewer. This corresponds to DHAT's --mode=heap mode.

The ad hoc profiling is via a second mode of operation, where ad hoc events can be manually inserted into a Rust program for aggregation and viewing. This corresponds to DHAT's --mode=ad-hoc mode.

Motivation

DHAT is a powerful heap profiler that comes with Valgrind. This crate is a related but alternative choice for heap profiling Rust programs. DHAT and this crate have the following differences.

  • This crate works on any platform, while DHAT only works on some platforms (Linux, mostly). (Note that DHAT's viewer is just HTML+JS+CSS and should work in any modern web browser on any platform.)
  • This crate causes a much smaller slowdown than DHAT.
  • This crate requires some modifications to a program's source code and recompilation, while DHAT does not.
  • This crate cannot track memory accesses the way DHAT does, because it does not instrument all memory loads and stores.
  • This crate does not provide profiling of copy functions such as memcpy and strcpy, unlike DHAT.
  • The backtraces produced by this crate may be better than those produced by DHAT.
  • DHAT measures a program's entire execution, but this crate only measures what happens within the scope of main. It will miss the small number of allocations that occur before or after main, within the Rust runtime.

Configuration

In your Cargo.toml file, as well as specifying dhat as a dependency, you should enable source line debug info:

[profile.release]
debug = 1

Usage (heap profiling)

For heap profiling, enable the global allocator by adding this code to your program:

use dhat::{Dhat, DhatAlloc};

#[global_allocator]
static ALLOCATOR: DhatAlloc = DhatAlloc;

Then add the following code to the very start of your main function:

# use dhat::Dhat;
let _dhat = Dhat::start_heap_profiling();

DhatAlloc is slower than the system allocator, so it should only be enabled while profiling.

Usage (ad hoc profiling)

Ad hoc profiling involves manually annotating hot code points and then aggregating the executed annotations in some fashion.

To do this, add the following code to the very start of your main function:

 # use dhat::Dhat;
 let _dhat = Dhat::start_ad_hoc_profiling();

Then insert calls like this at points of interest:

dhat::ad_hoc_event(100);

For example, imagine you have a hot function that is called from many call sites. You might want to know how often it is called and which other functions called it the most. In that case, you would add a ad_hoc_event call to that function, and the data collected by this crate and viewed with DHAT's viewer would show you exactly what you want to know.

The meaning of the integer argument to ad_hoc_event will depend on exactly what you are measuring. If there is no meaningful weight to give to an event, you can just use 1.

Running

For both heap profiling and ad hoc profiling, the program will run more slowly than normal. (Unfortunately, on Windows, it may run much more slowly. This is because backtrace gathering can be drastically slower on Windows than on other platforms.)

When the Dhat value is dropped at the end of main, some basic information will be printed to stderr. For heap profiling it will look like the following.

dhat: Total:     1,256 bytes in 6 blocks
dhat: At t-gmax: 1,256 bytes in 6 blocks
dhat: At t-end:  1,256 bytes in 6 blocks
dhat: The data in dhat-heap.json is viewable with dhat/dh_view.html

For ad hoc profiling it will look like the following.

dhat: Total:     141 units in 11 events
dhat: The data in dhat-ad-hoc.json is viewable with dhat/dh_view.html

A file called dhat-heap.json (for heap profiling) or dhat-ad-hoc.json (for ad hoc profiling) will be written. It can be viewed in DHAT's viewer.

If you don't see this output, it may be because your program called std::process::exit, which terminates a program without running any destructors. To work around this, explicitly call drop on the Dhat value just before the call to std::process:exit.

Viewing

Open a copy of DHAT's viewer, version 3.17 or later. There are two ways to do this.

  • Easier: Use the online version.
  • Harder: Clone the Valgrind repository with git clone git://sourceware.org/git/valgrind.git and open dhat/dh_view.html. (There is no need to build any code in this repository.)

Then click on the "Load…" button to load dhat-heap.json or dhat-ad-hoc.json.

DHAT's viewer shows a tree with nodes that look like this.

PP 1.1/6 {
  Total:     1,024 bytes (81.53%, 3,335,504.89/s) in 1 blocks (16.67%, 3,257.33/s), avg size 1,024 bytes, avg lifetime 61 µs (19.87% of program duration)
  Max:       1,024 bytes in 1 blocks, avg size 1,024 bytes
  At t-gmax: 1,024 bytes (81.53%) in 1 blocks (16.67%), avg size 1,024 bytes
  At t-end:  1,024 bytes (81.53%) in 1 blocks (16.67%), avg size 1,024 bytes
  Allocated at {
    #1: 0x10c1e4108: <alloc::alloc::Global as core::alloc::AllocRef>::alloc (alloc.rs:203:9)
    #2: 0x10c1e4108: alloc::raw_vec::RawVec<T,A>::allocate_in (raw_vec.rs:186:45)
    #3: 0x10c1e4108: alloc::raw_vec::RawVec<T,A>::with_capacity_in (raw_vec.rs:161:9)
    #4: 0x10c1e4108: alloc::raw_vec::RawVec<T>::with_capacity (raw_vec.rs:92:9)
    #5: 0x10c1e4108: alloc::vec::Vec<T>::with_capacity (vec.rs:355:20)
    #6: 0x10c1e4108: std::io::buffered::BufWriter<W>::with_capacity (buffered.rs:517:46)
    #7: 0x10c1e4108: std::io::buffered::LineWriter<W>::with_capacity (buffered.rs:925:29)
    #8: 0x10c1e4108: std::io::buffered::LineWriter<W>::new (buffered.rs:905:9)
    #9: 0x10c1e4108: std::io::stdio::stdout::stdout_init (stdio.rs:543:65)
    #10: 0x10c1e4108: std::io::lazy::Lazy<T>::init (lazy.rs:57:19)
    #11: 0x10c1e4108: std::io::lazy::Lazy<T>::get (lazy.rs:33:18)
    #12: 0x10c1e4108: std::io::stdio::stdout (stdio.rs:536:25)
    #13: 0x10c1e4ccb: std::io::stdio::print_to::{{closure}} (stdio.rs:890:13)
    #14: 0x10c1e4ccb: std::thread::local::LocalKey<T>::try_with (local.rs:265:16)
    #15: 0x10c1e4ccb: std::io::stdio::print_to (stdio.rs:879:18)
    #16: 0x10c1e4ccb: std::io::stdio::_print (stdio.rs:907:5)
    #17: 0x10c0d6826: heap::main (heap.rs:9:5)
  }
}

Full details about the output are in the DHAT documentation.

Note that DHAT uses the word "block" rather than "allocation" to refer to the memory allocated by a single heap allocation operation.

When heap profiling, this crate doesn't track memory accesses (unlike DHAT) and so the "reads" and "writes" measurements are not shown within DHAT's viewer, and "sort metric" views involving reads, writes, or accesses are not available.

The backtraces produced by this crate are trimmed to reduce output file sizes and improve readability in DHAT's viewer.

  • Only one allocation-related frame will be shown at the top of the backtrace. That frame may be a function within alloc::alloc, a function within this crate, or a global allocation function like __rg_alloc.
  • Common frames at the bottom of backtraces, below main, are omitted.

Dependencies

~5.5MB
~112K SLoC