9 stable releases
Uses new Rust 2024
| new 1.5.0 | Oct 25, 2025 |
|---|---|
| 1.4.0 | Sep 23, 2025 |
| 1.3.1 | Aug 21, 2020 |
| 1.3.0 | Apr 16, 2020 |
| 1.1.1 | Mar 27, 2020 |
#78 in Graphics APIs
153 downloads per month
670KB
3.5K
SLoC
gpu-trace-perf
This is a rust rewrite of some tooling I built for comparing performance between different graphics driver settings on graphics traces. The goal is for a driver developer to be able to quickly experiment and find how their changes affect the performance of actual rendering.
Right now apitrace, renderdoc, gfxretrace, and angle_trace_tests are supported. You pass the tool a collection of GPU traces (or the path to the angle_perf_traces binary, which will then enumerate tests), and it will run through all of them, rerunning to get better stats over time, and give you an estimate of the change in FPS from your driver change.
Installing
apt-get install cargo
cargo install gpu-trace-perf
For apitrace traces (*.trace), you also need apitrace installed. I recommend having apitrace's waffle backend enabled, and WAFFLE_PLATFORM=gbm set in the environment to not flicker windows on the screen constantly.
For renderdoc traces (*.rdc), you need:
- python3
- renderdoc installed (
sudo apt-get install renderdoc) - renderdoc's python module findable from python3.
Example usage
gpu-trace-perf run --traces $HOME/src/traces-db beforedriver afterdriver
This command will find all the traces in traces-db and run them in a loop printing stats until you feel ready to hit ^C.
The beforedriver and afterdriver arguments are scripts in your
path that set the environment to make you use your new driver, like
this:
#!/bin/sh
export LD_LIBRARY_PATH=$HOME/src/prefix/lib
"$@"
Since a traces db may be large and a change being tested may only affect a subset of the traces, you can filter down which traces the replayed repeatedly to only those whose stderr output is changed by some debug environment variables:
# Only re-run traces that had their shader compiler or command stream output changed on NVK.
gpu-trace-perf run --traces $HOME/src/traces-db beforedriver afterdriver \
--debug-filter "NVK_DEBUG=push_dump" --debug-filter "NAK_DEBUG=print"
Snapshot testing
When testing performance, you often want to also ensure that you haven't regressed rendering, so the tool provides a snapshot (screenshot) capture and diffing mode. This is a relatively new feature, and is currently supported only for apitrace traces. You can invoke it with the "snapshot" subcommand:
# Run with the system driver.
gpu-trace-perf snapshot --traces ~/src/traces-db --output baseline/
# Run with your new GPU driver.
meson devenv -C ~/src/mesa/build gpu-trace-perf snapshot --traces ~/src/traces-db --output test/ --baseline baseline/
The output directory will contain an index.html to summarize and visualize the differences between the baseline run and your new one. It uses javascript for some visualization, so for full functionality you'll need to run an HTTP server to avoid CORS errrors:
sudo apt install python3-rangehttpserver
cd test/ && python -m RangeHTTPServer
firefox localhost:8080
Running D3D traces
When its detects D3D traces (apitraces, or renderdoc traces with "dx9" or similar in the name), gpu-trace-perf will run the appropriate replay tool under wine. You will want to set your WINEPATH to include the directories for your apitrace and renderdoc directories with the commands in them, e.g.:
export WINEPATH=$HOME/apitrace-latest-win64/bin/;$HOME/RenderDoc_1.39_64/
### Running ANGLE traces
To include ANGLE traces in your test list, include the angle_trace_tests binary
in the `-t` list (or the traces directory). You can also select a specific
ANGLE trace with (for example):
gpu-trace-perf -t
### utrace timings
Normally, the trace tool-provided timings are used for the FPS result. However,
sometimes you only care about a specific subset of your command buffers, and
watching just those can reduce the noise of the timings. If the driver supports
utrace, you can pass "--utrace <comma_separated_utrace_events>" to use those
instead of the trace tool's timings. This can be a particular help with
renderdoc traces, which have high CPU overhead for per-renderpass setup on
Vulkan. Note that if your driver has a nonstandard prefix for start/end events,
you may need to add it to u_trace::Frame::event_times(). You can also specify
"drawcalls" as the event to expand to a list of common draw and compute events
across several drivers.
*NOTE* If you are testing apitraces, make sure you have apitrace newer than
13.0, that includes https://github.com/apitrace/apitrace/pull/961 to get correct
utrace results from GL traces. D3D traces still don't tear down properly, so
the utrace results will be unreliable.
### Shader heuristic analysis
Sometimes as a developer, you want to select between two modes of compiling or
running a shader based on some heuristic. This tool lets you generate the A/B
times per shader once, then iterate on your heuristic in the quick-to-run rust
code instead of doing lengthy trace replay runs for each idea you come up with.
The requirements are:
- The driver has utrace events for the start/end of draw call times
- The draw events include the hashes of the shaders for each stage involved in
the draw.
- The change to the shader is represented in the shader hash.
- `check_debug_filter()` includes the shader dumping env vars for your driver
- your before/after scripts only append to any shader debug env vars also
involved in shader dumping.
- `capture_draw_times()` has the utrace event names and the shader stage names
for your driver.
- `shader_parser.rs` supports parsing your driver's shader outputs.
Currently, turnip is supported, and some code exists to support other drivers,
but is incomplete. If you meet all the requirements, running looks like:
gpu-trace-perf run append beforedriver afterdriver --output results/ \
--capture-shaders --traces $HOME/src/traces-db
gpu-trace-perf shader-analyze output/
For any traces with shaders that changed modes, it will dump the per-trace
per-shader times between the before and after environments, and a table showing
the overall effect on trace times between the available heuristics (always
choose driver B, always choose optimally, and a dummy heuristic as an example).
Then, edit shader_analyze.rs to replace the dummy heuristic with your own,
potentially adding multiple to the table.
### Cross building for your embedded device
Add the following to ~/.cargo/config:
```
[target.armv7-unknown-linux-gnueabihf]
linker = "arm-linux-gnueabihf-gcc"
[target.aarch64-unknown-linux-gnu]
linker = "aarch64-linux-gnu-gcc"
```
And set up the new toolchain and build:
```
rustup target add aarch64-unknown-linux-gnu
cargo build --release --target aarch64-unknown-linux-gnu gpu-trace-perf
scp target/aarch64-unknown-linux-gnu/release/gpu-trace-perf device:bin/
```
### License
Licensed under the MIT license
([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
Dependencies
~32MB
~709K SLoC