#github #context #llm #repo #inference #txt-file #concatenation

app repocat

A tool to concatenate all code and text files in a github repo for LLM inference contexts

5 releases

0.2.0 Aug 27, 2024
0.1.3 Aug 24, 2024
0.1.2 Aug 24, 2024
0.1.1 Aug 9, 2024
0.1.0 Aug 9, 2024

#217 in Filesystem

Download history 184/week @ 2024-08-05 22/week @ 2024-08-12 186/week @ 2024-08-19 187/week @ 2024-08-26 15/week @ 2024-09-16 4/week @ 2024-09-30

206 downloads per month

MIT license

10KB
137 lines

REPOCAT 🐱

This is a simple cli tool that accepts either:

  1. a github repo url
  2. a path to a folder

and concatenates all text/code files into a single txt file. This makes it easier to use as context for LLMs.

What file extensions does it look for?

Check src/main.rs for extensions. Feel free to make a PR to add more

Does it automatically filter some files?

Yes! repocat uses the ignore crate from ripgrep, meaning it ignores all of the following by default:

Files and directories that match glob patterns in these three categories:
    .gitignore globs (including global and repo-specific globs). This includes .gitignore files in parent directories that are part of the same git repository. (Unless the --no-require-git flag is given.)
    .ignore globs, which take precedence over all gitignore globs when there's a conflict. This includes .ignore files in parent directories.
    .rgignore globs, which take precedence over all .ignore globs when there's a conflict. This includes .rgignore files in parent directories.
Hidden files and directories.
Binary files. (ripgrep considers any file with a NUL byte to be binary.)
Symbolic links aren't followed.

Dependencies

~7–20MB
~302K SLoC