#request #text-format #tracing #call #logging #spans #programs

reqray

Log ‘request x-rays’ for rust programs instrumented with tracing

11 unstable releases (3 breaking)

0.4.3 Feb 12, 2022
0.4.2 Feb 12, 2022
0.3.1 Feb 6, 2022
0.3.0 Dec 24, 2021
0.1.1 Dec 20, 2020

#114 in Profiling

MIT/Apache

52KB
759 lines

Reqray

Build Status Crate Docs

Log "request x-rays" for rust programs instrumented with tracing. This includes aggregated wall/own times as frequently found in flame graphs in a human-friendly text format.

To deploy it, you don't need some complicated services, just some local code added to your instrumented program.

This makes answers to these question often trivial to answer:

  • What part of the request takes the most time?
  • How many DB requests am I performing? Does the DB request aggregation work?
  • How far did the execution get before the error that aborted the request?

It looks like this:

2022-02-06T20:01:57.103747Z  INFO Call summary of request@examples/nested.rs:51

                        # calls │   ∑ alive ms │    ∑ busy ms │ ∑ own busy ms │ span tree
                    ────────────┼──────────────┼──────────────┼────────────-──┼───────────────────────
                          0 001258.910258.8900.106 ┊ ┬ request
                          0 00187.20487.19019.299 ┊ ├┬ nested
                          0 0010.0360.0210.021 ┊ ┊├─ random
                          1 00075.73861.91261.912 ┊ ┊╰─ repeated
                          0 0020.0510.0270.027 ┊ ├─ repeated
                          0 0011.6441.6320.019 ┊ ├┬ nest_deeply
                          0 0011.6191.6070.025 ┊ ┊╰┬ nest_deeply
                          0 0011.5931.5770.024 ┊ ┊ ╰┬ nest_deeply
                          0 0011.5611.5470.022 ┊ ┊  ╰┬ nest_deeply
                          0 0011.5321.5200.023 ┊ ┊   ╰┬ nest_deeply
                          0 0011.5041.4920.023 ┊ ┊    ╰┬ nest_deeply
                          0 0011.4761.4630.025 ┊ ┊     ╰┬ nest_deeply
                          0 0011.4461.4330.025 ┊ ┊      ╰┬ nest_deeply
                          0 0011.4151.4021.402 ┊ ┊       ╰─ nest_deeply
                          0 001169.915169.90517.883 ┊ ╰┬ nested2
                          0 0010.0100.0010.001 ┊  ├─ random
                          1 00088.79376.08176.081 ┊  ├─ repeated
                          0 00170.38670.37719.332 ┊  ╰┬ nested
                          0 0010.0110.0010.001 ┊   ├─ random
                          1 00058.46845.28045.280 ┊   ╰─ repeated
  • calls: The total number of spans created at this call path.
  • ∑ alive ms: The total time spans at this call path were alive i.e. sum of times between new and close events.
  • ∑ busy ms: The total time spans at this call path were entered i.e. sum of times between enter and leave events.
  • ∑ own busy ms: The total time spans at this call path were entered without any children entered.

It looked like this until 0.3.x:

Dec 20 18:48:32.405  INFO Call summary of request@examples/nested.rs:47

                        # calls │    ∑ wall ms │     ∑ own ms │ span tree
                    ────────────┼──────────────┼──────────────┼───────────────────────
                          0 001377.8860.260 ┊ ┬ request
                          0 001120.70448.896 ┊ ├┬ nested
                          0 0010.0080.008 ┊ ┊├─ random
                          1 00064.34764.347 ┊ ┊╰─ repeated
                          0 0020.1180.118 ┊ ├─ repeated
                          0 0013.8180.049 ┊ ├┬ nest_deeply
                          0 0013.7620.053 ┊ ┊╰┬ nest_deeply
                          0 0013.7020.057 ┊ ┊ ╰┬ nest_deeply
                          0 0013.6370.056 ┊ ┊  ╰┬ nest_deeply
                          0 0013.5740.058 ┊ ┊   ╰┬ nest_deeply
                          0 0013.5030.061 ┊ ┊    ╰┬ nest_deeply
                          0 0013.4350.063 ┊ ┊     ╰┬ nest_deeply
                          0 0013.3650.066 ┊ ┊      ╰┬ nest_deeply
                          0 0013.2923.292 ┊ ┊       ╰─ nest_deeply
                          0 001252.94949.258 ┊ ╰┬ nested2
                          0 0010.0060.006 ┊  ├─ random
                          1 00063.34363.343 ┊  ├─ repeated
                          0 001132.84154.091 ┊  ╰┬ nested
                          0 0010.0070.007 ┊   ├─ random
                          1 00070.87570.875 ┊   ╰─ repeated

Setup

For a quick startm, add/edit these [dependencies] in Cargo.toml:

tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["registry", "env-filter"] }
reqray = "0.4"

And add/edit your tracing layer setup:

    use reqray::CallTreeCollector;
    use tracing_subscriber::{EnvFilter, util::SubscriberInitExt, fmt, prelude::*};

    let fmt_layer = fmt::layer()
        .with_target(false);
    let filter_layer = EnvFilter::try_from_default_env()
        .or_else(|_| EnvFilter::try_new("info"))
        .unwrap();

    tracing_subscriber::registry()
    // -----------------------------------------------
        .with(CallTreeCollector::default())
    // -----------------------------------------------
        .with(filter_layer)
        .with(fmt_layer)
        .init();

Instead of CallTreeCollector::default() you can chose a more explicit config:

    // ...
    let call_tree_collector = CallTreeCollectorBuilder::default()
        .max_call_depth(10)
        .build_with_collector(
            LoggingCallTreeCollectorBuilder::default()
                .left_margin(20)
                .build(),
        );

    tracing_subscriber::registry()
        .with(call_tree_collector)
        // ...

Compatibility with tracing-subscriber 0.2

Use reqray 0.2.x for integration with tracing-subscriber 0.2.x. Otherwise, the API should be identical.

E.g. color_eyre 0.5.x depends on tracing-error 0.1.x which requires tracing-subscriber 0.2.

Overhead

I did basic performance testing (see benches) to check for obvious gotchas -- I didn't spot any. If your code actually does talk to a database or anything expensive, it should be in the same order of magnitude as logging overhead with the tracing library in general.

In my totally unrepresentative example with some log statements which does nothing else really, the logging overhead increased by 30-50% -- this is roughly the amount of actual log lines added to the log output in this case.

Generally, you should only instrument relevant calls in your program not every one of them, especially not those in a CPU-bound loop. If you have those, it might make sense to filter those before the CallTreeCollector is invoked.

I am very curious to hear actual stats in real life programs!

Inspiration

When working together with Klas Kalass, he created something similar for Java: the Fuava CTProfiler.

It proved to be immensely useful at a nearly daily basis. Thank you, Klas!

Since then, I have worked with sophisticated distributed tracing systems, but they often lacked aggregation possibilities. Others hacked some interesting aggregation scripts on top and I myself became somewhat obsessed with creating similar scripts.

Thanks

I felt very welcome when I suggested something like this in issue tracing#639. Thank you, @hawkw!

Similarly, Eliza was very supportive and helpful in the tracing discod channel. Thank you, Eliza!

Contributions

Giving feedback or saying thanks on Twitter or on the tracing discord channel is appreciated.

Contributions in the form of documentation and bug fixes are highly welcome. Please start a discussion with me (e.g. via an issue) before working on larger features.

I'd really appreciate tests for all new features. Please run cargo test before submitting a pull request. Just use cargo fmt for formatting.

Feature ideas are also welcome -- just know that this is a pure hobby side project and I will not allocate a lot of bandwidth to this. Therefore, important bug fixes are always prioritised.

By submitting a pull request, you agree to license your changes via all the current licenses of the project.

Dependencies

~2.2–4MB
~69K SLoC