7 releases

0.3.3 Feb 14, 2024
0.3.2 Feb 14, 2023
0.3.1 Jan 6, 2023
0.3.0 Aug 25, 2019
0.1.0 Jan 23, 2019

#218 in Science

Apache-2.0

37KB
820 lines

reap

A tool for parsing Ruby heap dumps by analyzing the reference graph.

Supports drilldown into just the memory retained by a given object, and optional graphical output.

When to use reap

This tool is intended to be useful for optimizing memory usage as well as debugging memory leaks. If you have a snapshot of the heap of a Ruby process (see below for tips on getting one), reap can help you understand the contents of that snapshot.

To do so, we build a dominator tree from the reference graph showing which objects are holding on to large quantities of memory. (Node v "dominates" node w in a directed graph if all paths from a given root to w run through v. In the context of memory references, this implies that object w is only live because object v is live.)

Limitations & comparisons

reap does not currently understand garbage collection "generations", which can also be useful for finding leaks.

The disadvantage of analyzing GC generations is that in order to collect the necessary data, you need to trace object allocations, which can be prohibitively expensive in production. If this is not a problem for you, you may want to try another tool such as heapy instead of, or in addition to, reap.

reap is intended to provide useful data even when allocations are not being traced. It can also analyze fairly large (gigabyte-plus) heaps in seconds, thanks to being written in Rust rather than Ruby.

How to use reap

Run with --help for full documentation.

Basic usage:

$ cargo run -q --release -- /tmp/heap.json -f flamegraph.svg -c 3
Object types using the most live memory:
Thread: 2.1 MB (40 objects)
String: 462.6 KB (9235 objects)
Class: 223.7 KB (287 objects)
...: 653.0 KB (5909 objects)

Objects retaining the most live memory:
root: 3.4 MB (15472 objects)
Thread[0x7f83df87dc40]: 1.1 MB (25 objects)
Thread[0x7f83e107cd78]: 1.0 MB (7 objects)
...: 4.6 MB (59857 objects)

Object types retaining the most live memory:
ROOT: 3.4 MB (15472 objects)
Thread: 2.1 MB (70 objects)
ARRAY: 949.3 KB (13053 objects)
...: 3.6 MB (46766 objects)

Objects unreachable from root:
Class: 189.6 KB (617 objects)
String: 81.8 KB (1174 objects)
ARRAY: 38.6 KB (298 objects)
...: 91.5 KB (1422 objects)

Wrote 15471 nodes to flamegraph.svg

Dig into a subtree (in this case, the larger Thread):

$ cargo run -q --release -- /tmp/heap.json -d out.dot -c 3 -r 0x7f83df87dc40
Object types using the most live memory:
Thread: 1.0 MB (1 objects)
Class: 1.6 KB (3 objects)
Hash: 1.3 KB (7 objects)
...: 980 B (14 objects)

Objects retaining the most live memory:
Thread[0x7f83df87dc40]: 1.1 MB (25 objects)
Hash[0x7f83e10452d8][size=5]: 1.2 KB (6 objects)
Object[0x7f83df8d62c8][CLASS]: 992 B (8 objects)
...: 3.0 KB (24 objects)

Object types retaining the most live memory:
Thread: 1.1 MB (25 objects)
Hash: 2.2 KB (12 objects)
Class: 1.9 KB (10 objects)
...: 1.1 KB (16 objects)

Objects reachable from, but not dominated by, 0x7f83df87dc40:
String: 352.3 KB (6604 objects)
Class: 220.6 KB (283 objects)
Regexp: 108.8 KB (139 objects)
...: 465.2 KB (5716 objects)

Wrote 1 nodes & 0 edges to out.dot

Installation

Ensure you have Rust's cargo package manager installed, then just cargo install reap.

Getting a heap dump

If you have rbtrace installed, and required in the process you're planning to trace, you can run:

rbtrace -p $PID -e "Thread.new{require 'objspace';f=open('/tmp/heap.json','w');ObjectSpace.dump_all(output: f, full: true);f.close}"

Otherwise, you can connect to the Ruby process with gdb, then run:

call rb_eval_string_protect("Thread.new{require 'objspace';f=open('/tmp/heap.json','w');ObjectSpace.dump_all(output: f, full: true);f.close}", 0)

Dependencies

~8–16MB
~202K SLoC