3 releases (stable)
Uses new Rust 2024
| new 2.0.0 | Nov 4, 2025 |
|---|---|
| 1.0.0 | Oct 7, 2025 |
| 0.1.0 | Oct 6, 2025 |
#120 in Visualization
345 downloads per month
340KB
1.5K
SLoC
Gix Of Theseus
A re-implementation of Git Of Theseus by Erik Bern with fewer features but hundreds of times faster.
Generates graphs of the composition of codebases over time:
Git's composition over time (6s to generate).
The Linux repo (~1 minute to generate):

It' fast because it uses a specialized algorithm (inspired from hercules) to implement its own "incremental" git blame that keeps track of results as it scans the history of the repo, and because it's written in Rust, which gives it access to the wonderful gitoxide and rayon crates.
Installation
Install this project with cargo:
cargo install gix-of-theseus
The plots are generated by a modernised version of the original plotting scripts (bundled into the binary), which make them require a PEP 723 runner (uv or pipx) to be installed on you computer, so make sure you have one installed. For example both are distributed on PyPi:
pip install uv
# or
pip install pipx
Usage
To get an image directly, (if you have uv installed):
gix-of-theseus analyze ~/repos/git/git
Will save its results to ${repo_name}/stackplot.png. Choose a different output directory location with --outdir.
The --no-plot flag will make the tool collect the data in the same cohorts.json format but not plot it.
You can also plot cohorts.json files separately with the stackplot command, (given uv is installed):
gix-of-theseus stackplot cohorts.json
# equivalent of cloning this repo and doing:
uv run src/stackplot.py cohorts.json
By default this tool will not count files that don't "look like" source code (eg end in a recognizable extension like .cpp or .ts). You can turn this behavior off with the --all-filetypes flag.
Caveats
This tool is faster because it doesn't re-implement the full feature set of Git of Theseus. Notably it doesn't:
- collect author information, or plot anything but the year of the commit
- plot the "forgetting curve" of commits
- The only behavior is
--all-filetypes; there is no filtering to "only source code files", or only to count LOC. This is coming but not a big priority. .git-rev-ignorefiles are not supported.
I plan on implementing some of these features, but they are not present yet. I don't find the author information valuable so I place a low priority on plotting it, "PRs welcome" if you really want it.
As this is a custom blame implementation, git-specific features like .git-rev-ignore files haven't been re-implemented yet. Some of these can be filled in later, but this tool is already useful for a lot of repos and this doesn't seem to be a major issue.
The stackplots generated are not 100% identical with the original's output. I would say they're 98% the same, which is fine for this type of analysis.
Some speed comparison for fun
These are rough measurements just for fun, to make this author feel better about spending all this time Rewriting It In Rust.
| Repo | Original [s] | This repo [s] | Speedup |
|---|---|---|---|
| torvalds/linux | ~36000 | 68 | ~530x |
| ffmpeg/ffmpeg | 8195 | 9.6 | 853x |
| elastic/elasticsearch | 8193 | 9.4 | 871x |
| python/cpython | 7397 | 15.0 | 493x |
| git/git | 3011 | 6.2 | 579x |
| golang/go | 3643 | 7.0 | 540x |
- Run on a M1 Max laptop with
time. - git-of-theseus was run with --procs 15 (seems a little IO bound on this machine) and with the --all-filetypes flag, to match this project's behavior.
The speedup on these large repos is conservatively ~500x, though the gap is larger in huge, old repos.
I speculate that the runtime (and the speedup) is more related to total volume of code (SLOC) in a repo, though theoretically the main improvement should be making it linear in the number of commits instead of quadratic.
Dependencies
~31–48MB
~795K SLoC