#model #context #package-page-facing-up

app git-llm-bundler

A tool to bundle git repositories into a single file for LLM context

1 unstable release

new 0.1.0 Apr 2, 2025

#356 in Parser implementations

Custom license

47KB
409 lines

git-llm-bundler 📦📄

Bundle relevant files from a Git repository into a single text file, optimized for Large Language Model (LLM) context.

License: GNU

Stop manually copying and pasting code into LLMs! git-llm-bundler intelligently selects and bundles text files from any Git repository, creating a clean, context-ready file perfect for feeding into models like GPT-4, Claude, Gemini, and others.

It respects .gitignore, filters out binary files, ignores overly large files, and cleans up whitespace, giving you just the relevant content. Plus, it provides token counts for popular models!

✨ Features

  • Git Integration: Clones any public or private (if accessible) Git repository.
  • Branch Selection: Specify a branch, tag, or commit hash to bundle.
  • Intelligent Filtering:
    • Respects .gitignore rules (both local and global).
    • Automatically skips binary files (images, executables, archives, etc.).
    • Excludes files exceeding a configurable size limit.
    • Ignores common unnecessary files/directories (.git, .vscode, node_modules, LICENSE, currently hardcoded.)
  • Clean Output: Removes excess blank lines while preserving indentation.
  • Detailed Statistics: Reports total lines, characters, and estimated token counts for:
    • cl100k_base (GPT-4, GPT-3.5-Turbo, GPT-4o)
    • p50k_base (Older GPT-3)
    • r50k_base (Legacy GPT-3)
  • Flexible Output:
    • Standard human-readable progress and stats.
    • --json mode for programmatic integration, outputting only a JSON object with the bundle path and metrics.
    • --verbose mode for detailed debugging information.
  • Cross-Platform: Built with Rust, runs on Linux, macOS, and Windows.

Requirements

  • Rust installed
  • Git Commandline Installed (we call out to git for grabbing the repo currently.)

🚀 Installation

Using Cargo

If you have Rust and Cargo installed, you can install directly from the source (replace with actual URL):

git clone https://github.com/cchance27/git-llm-bundler.git

cd git-llm-bundler

cargo run -- [OPTIONS] --repo-url <REPO_URL>

Required:

-r, --repo-url <REPO_URL>: The URL of the Git repository to clone (e.g., https://github.com/rust-lang/rust.git).

Options:

-o, --output <OUTPUT>: Path for the output bundle file [default: bundle.txt].

-b, --branch <BRANCH>: Specify a branch, tag, or commit hash to check out [default: repository's default branch].

-m, --max-size <MAX_SIZE>: Maximum size (in KB) for individual files to be included [default: 1000].

--json: Output results as a single JSON object to stdout (suppresses other console output). Conflicts with --verbose.

-v, --verbose: Print detailed processing information. Conflicts with --json.

-h, --help: Print help information.

-V, --version: Print version information.

Examples

  1. Basic Bundling (to bundle.txt)
cargo run -- --repo-url https://github.com/example/repo.git
  1. Bundling a Specific Branch to a Named File
cargo run -- -r https://github.com/example/repo.git -b master -o bundle-master.txt
  1. Bundling with a Smaller File Size Limit
cargo run -- -r https://github.com/example/repo.git -m 500 -o bundle-max-per-file.txt
  1. Debugging with Verbose Output
cargo run -- -r https://github.com/example/repo.git --verbose
  1. Getting JSON Output (for scripting)
cargo run -- -r https://github.com/example/repo.git --json > bundle.json

📊 Example Outputs

Standard Output

Cloning repository: https://github.com/example/myproject.git
Checking out branch: feature/new-thing
Found 58 files to process
Successfully processed 58 files
Repository successfully bundled to: my_bundle.txt

Total lines in bundle: 10523
Total characters: 315890
GPT-4/3.5/4o (cl100k_base): 78,972 tokens
GPT-3 (p50k_base): 81,234 tokens
Legacy GPT-3 (r50k_base): 80,555 tokens

JSON Output (--json)

{
  "bundle_path": "/path/to/your/project/bundle.txt",
  "metrics": {
    "total_lines": 25012,
    "total_characters": 850450,
    "token_counts": {
      "GPT-3 (p50k_base)": 215600,
      "GPT-4/3.5/4o (cl100k_base)": 205100,
      "Legacy GPT-3 (r50k_base)": 211300
    }
  }
}

Bundle File Format (bundle.txt)

========== FILE: src/main.rs ==========
fn main() {
    println!("Hello, world!");
}
========== FILE: src/lib.rs ==========
pub fn add(left: usize, right: usize) -> usize {
    left + right
}
========== FILE: README.md ==========
# Project Title

Why git-llm-bundler?

    Saves Time: Automates the tedious process of gathering code for LLMs.

    Reduces Noise: Filters out irrelevant files (.git, binaries, large assets, ignored files).

    Context Optimization: Provides clean, concatenated text, maximizing the value of

Dependencies

~17–28MB
~286K SLoC