2 unstable releases
| 0.6.1 | Oct 13, 2025 |
|---|---|
| 0.5.1 | Oct 8, 2025 |
#660 in Algorithms
379 downloads per month
135KB
3K
SLoC
gsort - High-Performance External Merge Sort
A fast, memory-efficient external merge sort implementation in Rust, compatible with GNU sort.
Features
-
✅ High Performance: Faster than GNU sort for many workloads
- 24M file: 0.055s (69% faster than GNU sort)
- 476M file: 0.809s (33% faster than GNU sort)
- 4.7G file: 9.2s (no temp files, single pass)
-
✅ Smart Memory Management
- Automatic buffer sizing based on file size
- No temp files for files that fit in memory (up to 32GB)
- Efficient external merge sort for larger files
-
✅ Multi-threaded: Parallel sorting for large datasets
-
✅ GNU sort Compatible: Drop-in replacement
-
✅ Flexible: Command-line tool or Rust library
Quick Start
Download pre-built binaries or build from source:
cargo build --release
sudo cp target/release/gsort /usr/local/bin/
Usage
gsort input.txt > sorted.txt
gsort -r input.txt > sorted.txt # Reverse
gsort -n numbers.txt > sorted.txt # Numeric
gsort -S 2G large_file.txt > sorted.txt # Custom buffer
See LIBRARY.md for library usage. See PACKAGING.md for building packages.
Help
$ gsort --help
Configuration for the sort operation.
This struct holds all command-line options and provides methods for parsing arguments and accessing configuration values.
# Examples
``` use gsort::Config; use clap::Parser;
// Parse from command-line arguments let config = Config::parse_from(&["gsort", "-r", "-n", "data.txt"]); assert!(config.reverse); assert!(config.numeric); ```
Usage: gsort.exe [OPTIONS] [FILE]
Arguments:
[FILE]
Input file as the last positional argument. If provided together with --files0-from, it's an error
Options:
-S, --buffer-size <BUFFER_SIZE>
Max memory for in-memory chunk (e.g. 256K, 64M). Default KiB if no suffix
[default: 16M]
--batch-size <BATCH_SIZE>
Max runs merged at once (batch size)
[default: 16]
-r, --reverse
Reverse order
-u, --unique
Unique: suppress duplicate lines in output
-c, --check
Check: do not output; exit status 0 if input is sorted
-C, --check-quiet
Check silently: like -c, but do not report first bad line
--key-start <KEY_START>
Key start (field[.char]) 1-based, e.g., 2 or 2.3 (deprecated, use -k instead)
--key-end <KEY_END>
Key end (field[.char]) 1-based, inclusive field index (deprecated, use -k instead)
-k, --key <KEYDEF>
Key definition: F\[.C\]\[OPTS\]\[,F\[.C\]\[OPTS\]\] (can specify multiple times)
-b, --skip-blanks
Skip blanks at key boundaries (-b)
-z, --zero-terminated
Zero-terminated lines (NUL) instead of newline
-t, --field-separator <FIELD_SEPARATOR>
Field separator (default: whitespace)
-o, --output <OUTPUT>
Output file (default: stdout)
-m, --merge
Merge already-sorted inputs; do not sort
--files0-from <FILES0_FROM>
Read input file names (NUL-terminated) from file F (or '-' for stdin)
-f, --ignore-case
Ignore case
-d, --dictionary-order
Dictionary order (letters, digits, blanks only)
-i, --ignore-nonprinting
Ignore nonprinting
-n, --numeric-sort
Numeric sort
-g, --general-numeric-sort
General numeric sort (with scientific notation)
--human-numeric-sort
Human readable numeric sort (e.g. 2K 1G). Use --human-numeric-sort to avoid -h(help) clash
-M, --month-sort
Month sort (JAN..DEC)
-V, --version-sort
Version sort
--sort <SORT_WORD>
--sort=WORD selector: numeric, general-numeric, human-numeric, month, version
-v, --verbose
Verbose: print debug timings and phase info
-s, --stable
Stable sort: preserve original order of equal lines
-T, --temporary-directory <TEMP_DIRS>
Temporary directory for intermediate files (can specify multiple times)
--debug
Debug mode: annotate the part of the line used to sort
-R, --random-sort
Random sort: shuffle lines but group identical keys together
--random-source <RANDOM_SOURCE>
Random source file for -R (default: system random)
--parallel <PARALLEL>
Number of parallel sort threads (default: number of CPUs)
[default: 20]
--compress
Compress temporary files (gzip compression)
-h, --help
Print help (see a summary with '-h')
Dependencies
~9–39MB
~594K SLoC