#processing #parallel-processing #directory #file #file-content #output #command-line-tool

bin+lib combiner

Combiner is a Rust-based command-line tool that processes text files in a given directory, combining their contents into a single output file

13 releases

0.2.2 Aug 19, 2024
0.2.1 Aug 15, 2024
0.1.10 Aug 15, 2024
0.1.7 Jun 25, 2024

#1604 in Command line utilities

Download history 519/week @ 2024-06-22 25/week @ 2024-06-29 523/week @ 2024-08-10 162/week @ 2024-08-17 10/week @ 2024-08-24

263 downloads per month

MIT license

15KB
248 lines

Combiner

Combiner is a Rust-based command-line tool that processes text files in a given directory, combining their contents into a single output file. This tool is particularly useful for providing context to Large Language Models (LLMs) about the files in a project, streamlining the process of getting debugging advice or a project overview.

Features

  • Recursively scans directories for text files
  • Token counting using the tiktoken-rs library
  • Rayon-based parallel processing for faster processing
  • Detailed output statistics

Installation

Prerequisites

Building from source

  1. Clone the repository:

    git clone https://github.com/jesalx/combiner.git
    cd combiner
    
  2. Build the project:

    cargo build --release
    
  3. The binary will be available at target/release/combiner

Alternatively, you can use install combiner using cargo:

cargo install combiner

Usage

Basic usage:

combiner -d <directory> -o <output> -t <tokenizer>

For more options:

combiner --help

Command-line Options

  • -d, --directory <directory>: Input directory to process (default: current directory)
  • -o, --output <output>: Output file path/name
  • -t, --tokenizer <tokenizer>: Tokenizer to use (default: p50k_base)

Output

The program generates a single output file containing the contents of all processed text files. Each file's content is preceded by its file path and separated by a line of dashes.

The program also prints a summary table showing:

  • Number of files processed
  • Total number of tokens
  • Output file path
  • Processing time
  • Top file by token count

Dependencies

~19–28MB
~257K SLoC