#merge-sorting #sorting #cli #merge

bin+lib gsort

A fast, memory-efficient external merge sort implementation in Rust, compatible with GNU sort

2 unstable releases

0.6.1 Oct 13, 2025
0.5.1 Oct 8, 2025

#660 in Algorithms

Download history 332/week @ 2025-10-08 42/week @ 2025-10-15 5/week @ 2025-10-22

379 downloads per month

Apache-2.0

135KB
3K SLoC

Rust 2.5K SLoC // 0.1% comments Shell 455 SLoC // 0.1% comments Batch 160 SLoC // 0.1% comments

gsort - High-Performance External Merge Sort

A fast, memory-efficient external merge sort implementation in Rust, compatible with GNU sort.

Features

  • High Performance: Faster than GNU sort for many workloads

    • 24M file: 0.055s (69% faster than GNU sort)
    • 476M file: 0.809s (33% faster than GNU sort)
    • 4.7G file: 9.2s (no temp files, single pass)
  • Smart Memory Management

    • Automatic buffer sizing based on file size
    • No temp files for files that fit in memory (up to 32GB)
    • Efficient external merge sort for larger files
  • Multi-threaded: Parallel sorting for large datasets

  • GNU sort Compatible: Drop-in replacement

  • Flexible: Command-line tool or Rust library

Quick Start

Download pre-built binaries or build from source:

cargo build --release
sudo cp target/release/gsort /usr/local/bin/

Usage

gsort input.txt > sorted.txt
gsort -r input.txt > sorted.txt         # Reverse
gsort -n numbers.txt > sorted.txt       # Numeric
gsort -S 2G large_file.txt > sorted.txt # Custom buffer

See LIBRARY.md for library usage. See PACKAGING.md for building packages.

Help

$ gsort --help
Configuration for the sort operation.

This struct holds all command-line options and provides methods for parsing arguments and accessing configuration values.

# Examples

``` use gsort::Config; use clap::Parser;

// Parse from command-line arguments let config = Config::parse_from(&["gsort", "-r", "-n", "data.txt"]); assert!(config.reverse); assert!(config.numeric); ```

Usage: gsort.exe [OPTIONS] [FILE]

Arguments:
  [FILE]
          Input file as the last positional argument. If provided together with --files0-from, it's an error

Options:
  -S, --buffer-size <BUFFER_SIZE>
          Max memory for in-memory chunk (e.g. 256K, 64M). Default KiB if no suffix

          [default: 16M]

      --batch-size <BATCH_SIZE>
          Max runs merged at once (batch size)

          [default: 16]

  -r, --reverse
          Reverse order

  -u, --unique
          Unique: suppress duplicate lines in output

  -c, --check
          Check: do not output; exit status 0 if input is sorted

  -C, --check-quiet
          Check silently: like -c, but do not report first bad line

      --key-start <KEY_START>
          Key start (field[.char]) 1-based, e.g., 2 or 2.3 (deprecated, use -k instead)

      --key-end <KEY_END>
          Key end (field[.char]) 1-based, inclusive field index (deprecated, use -k instead)

  -k, --key <KEYDEF>
          Key definition: F\[.C\]\[OPTS\]\[,F\[.C\]\[OPTS\]\] (can specify multiple times)

  -b, --skip-blanks
          Skip blanks at key boundaries (-b)

  -z, --zero-terminated
          Zero-terminated lines (NUL) instead of newline

  -t, --field-separator <FIELD_SEPARATOR>
          Field separator (default: whitespace)

  -o, --output <OUTPUT>
          Output file (default: stdout)

  -m, --merge
          Merge already-sorted inputs; do not sort

      --files0-from <FILES0_FROM>
          Read input file names (NUL-terminated) from file F (or '-' for stdin)

  -f, --ignore-case
          Ignore case

  -d, --dictionary-order
          Dictionary order (letters, digits, blanks only)

  -i, --ignore-nonprinting
          Ignore nonprinting

  -n, --numeric-sort
          Numeric sort

  -g, --general-numeric-sort
          General numeric sort (with scientific notation)

      --human-numeric-sort
          Human readable numeric sort (e.g. 2K 1G). Use --human-numeric-sort to avoid -h(help) clash

  -M, --month-sort
          Month sort (JAN..DEC)

  -V, --version-sort
          Version sort

      --sort <SORT_WORD>
          --sort=WORD selector: numeric, general-numeric, human-numeric, month, version

  -v, --verbose
          Verbose: print debug timings and phase info

  -s, --stable
          Stable sort: preserve original order of equal lines

  -T, --temporary-directory <TEMP_DIRS>
          Temporary directory for intermediate files (can specify multiple times)

      --debug
          Debug mode: annotate the part of the line used to sort

  -R, --random-sort
          Random sort: shuffle lines but group identical keys together

      --random-source <RANDOM_SOURCE>
          Random source file for -R (default: system random)

      --parallel <PARALLEL>
          Number of parallel sort threads (default: number of CPUs)

          [default: 20]

      --compress
          Compress temporary files (gzip compression)

  -h, --help
          Print help (see a summary with '-h')

Dependencies

~9–39MB
~594K SLoC