1 unstable release
new 0.1.0 | Apr 2, 2025 |
---|
#356 in Parser implementations
47KB
409 lines
git-llm-bundler 📦📄
Bundle relevant files from a Git repository into a single text file, optimized for Large Language Model (LLM) context.
Stop manually copying and pasting code into LLMs! git-llm-bundler intelligently selects and bundles text files from any Git repository, creating a clean, context-ready file perfect for feeding into models like GPT-4, Claude, Gemini, and others.
It respects .gitignore
, filters out binary files, ignores overly large files, and cleans up whitespace, giving you just the relevant content. Plus, it provides token counts for popular models!
✨ Features
- Git Integration: Clones any public or private (if accessible) Git repository.
- Branch Selection: Specify a branch, tag, or commit hash to bundle.
- Intelligent Filtering:
- Respects
.gitignore
rules (both local and global). - Automatically skips binary files (images, executables, archives, etc.).
- Excludes files exceeding a configurable size limit.
- Ignores common unnecessary files/directories (
.git
,.vscode
,node_modules
,LICENSE
, currently hardcoded.)
- Respects
- Clean Output: Removes excess blank lines while preserving indentation.
- Detailed Statistics: Reports total lines, characters, and estimated token counts for:
cl100k_base
(GPT-4, GPT-3.5-Turbo, GPT-4o)p50k_base
(Older GPT-3)r50k_base
(Legacy GPT-3)
- Flexible Output:
- Standard human-readable progress and stats.
--json
mode for programmatic integration, outputting only a JSON object with the bundle path and metrics.--verbose
mode for detailed debugging information.
- Cross-Platform: Built with Rust, runs on Linux, macOS, and Windows.
Requirements
- Rust installed
- Git Commandline Installed (we call out to git for grabbing the repo currently.)
🚀 Installation
Using Cargo
If you have Rust and Cargo installed, you can install directly from the source (replace with actual URL):
git clone https://github.com/cchance27/git-llm-bundler.git
cd git-llm-bundler
cargo run -- [OPTIONS] --repo-url <REPO_URL>
Required:
-r, --repo-url <REPO_URL>: The URL of the Git repository to clone (e.g., https://github.com/rust-lang/rust.git).
Options:
-o, --output <OUTPUT>: Path for the output bundle file [default: bundle.txt].
-b, --branch <BRANCH>: Specify a branch, tag, or commit hash to check out [default: repository's default branch].
-m, --max-size <MAX_SIZE>: Maximum size (in KB) for individual files to be included [default: 1000].
--json: Output results as a single JSON object to stdout (suppresses other console output). Conflicts with --verbose.
-v, --verbose: Print detailed processing information. Conflicts with --json.
-h, --help: Print help information.
-V, --version: Print version information.
Examples
- Basic Bundling (to bundle.txt)
cargo run -- --repo-url https://github.com/example/repo.git
- Bundling a Specific Branch to a Named File
cargo run -- -r https://github.com/example/repo.git -b master -o bundle-master.txt
- Bundling with a Smaller File Size Limit
cargo run -- -r https://github.com/example/repo.git -m 500 -o bundle-max-per-file.txt
- Debugging with Verbose Output
cargo run -- -r https://github.com/example/repo.git --verbose
- Getting JSON Output (for scripting)
cargo run -- -r https://github.com/example/repo.git --json > bundle.json
📊 Example Outputs
Standard Output
Cloning repository: https://github.com/example/myproject.git
Checking out branch: feature/new-thing
Found 58 files to process
Successfully processed 58 files
Repository successfully bundled to: my_bundle.txt
Total lines in bundle: 10523
Total characters: 315890
GPT-4/3.5/4o (cl100k_base): 78,972 tokens
GPT-3 (p50k_base): 81,234 tokens
Legacy GPT-3 (r50k_base): 80,555 tokens
JSON Output (--json)
{
"bundle_path": "/path/to/your/project/bundle.txt",
"metrics": {
"total_lines": 25012,
"total_characters": 850450,
"token_counts": {
"GPT-3 (p50k_base)": 215600,
"GPT-4/3.5/4o (cl100k_base)": 205100,
"Legacy GPT-3 (r50k_base)": 211300
}
}
}
Bundle File Format (bundle.txt)
========== FILE: src/main.rs ==========
fn main() {
println!("Hello, world!");
}
========== FILE: src/lib.rs ==========
pub fn add(left: usize, right: usize) -> usize {
left + right
}
========== FILE: README.md ==========
# Project Title
Why git-llm-bundler?
Saves Time: Automates the tedious process of gathering code for LLMs.
Reduces Noise: Filters out irrelevant files (.git, binaries, large assets, ignored files).
Context Optimization: Provides clean, concatenated text, maximizing the value of
Dependencies
~17–28MB
~286K SLoC