3 releases (stable)

Uses new Rust 2024

new 2.0.0 Nov 4, 2025
1.0.0 Oct 7, 2025
0.1.0 Oct 6, 2025

#120 in Visualization

Download history 82/week @ 2025-09-30 237/week @ 2025-10-07 23/week @ 2025-10-14 3/week @ 2025-10-21

345 downloads per month

Apache-2.0

340KB
1.5K SLoC

Gix Of Theseus

A re-implementation of Git Of Theseus by Erik Bern with fewer features but hundreds of times faster.

Generates graphs of the composition of codebases over time: A stack plot of the composition of git's source code over the years. Each year has its own color in the stack plot, making it look like a layer o sedimentary rock slowly weathered over time. Git's composition over time (6s to generate).

The Linux repo (~1 minute to generate): The same kind of graph but for linux

It' fast because it uses a specialized algorithm (inspired from hercules) to implement its own "incremental" git blame that keeps track of results as it scans the history of the repo, and because it's written in Rust, which gives it access to the wonderful gitoxide and rayon crates.

Installation

Install this project with cargo:

cargo install gix-of-theseus

The plots are generated by a modernised version of the original plotting scripts (bundled into the binary), which make them require a PEP 723 runner (uv or pipx) to be installed on you computer, so make sure you have one installed. For example both are distributed on PyPi:

pip install uv
# or
pip install pipx

Usage

To get an image directly, (if you have uv installed):

gix-of-theseus analyze ~/repos/git/git

Will save its results to ${repo_name}/stackplot.png. Choose a different output directory location with --outdir.

The --no-plot flag will make the tool collect the data in the same cohorts.json format but not plot it.

You can also plot cohorts.json files separately with the stackplot command, (given uv is installed):

gix-of-theseus stackplot cohorts.json

# equivalent of cloning this repo and doing:

uv run src/stackplot.py cohorts.json

By default this tool will not count files that don't "look like" source code (eg end in a recognizable extension like .cpp or .ts). You can turn this behavior off with the --all-filetypes flag.

Caveats

This tool is faster because it doesn't re-implement the full feature set of Git of Theseus. Notably it doesn't:

  • collect author information, or plot anything but the year of the commit
  • plot the "forgetting curve" of commits
  • The only behavior is --all-filetypes; there is no filtering to "only source code files", or only to count LOC. This is coming but not a big priority.
  • .git-rev-ignore files are not supported.

I plan on implementing some of these features, but they are not present yet. I don't find the author information valuable so I place a low priority on plotting it, "PRs welcome" if you really want it.

As this is a custom blame implementation, git-specific features like .git-rev-ignore files haven't been re-implemented yet. Some of these can be filled in later, but this tool is already useful for a lot of repos and this doesn't seem to be a major issue.

The stackplots generated are not 100% identical with the original's output. I would say they're 98% the same, which is fine for this type of analysis.

Some speed comparison for fun

These are rough measurements just for fun, to make this author feel better about spending all this time Rewriting It In Rust.

Repo Original [s] This repo [s] Speedup
torvalds/linux ~36000 68 ~530x
ffmpeg/ffmpeg 8195 9.6 853x
elastic/elasticsearch 8193 9.4 871x
python/cpython 7397 15.0 493x
git/git 3011 6.2 579x
golang/go 3643 7.0 540x

  • Run on a M1 Max laptop with time.
  • git-of-theseus was run with --procs 15 (seems a little IO bound on this machine) and with the --all-filetypes flag, to match this project's behavior.

The speedup on these large repos is conservatively ~500x, though the gap is larger in huge, old repos.

I speculate that the runtime (and the speedup) is more related to total volume of code (SLOC) in a repo, though theoretically the main improvement should be making it linear in the number of commits instead of quadratic.

Dependencies

~31–48MB
~795K SLoC