1 stable release
Uses new Rust 2024
new 1.0.0 | Apr 28, 2025 |
---|
#20 in #duplicates
15KB
192 lines
Duplicate File Finder
A fast and efficient tool to detect duplicate files in a directory based on file content.
Features
- Partial Hashing for quick initial grouping (reads first 4 KB).
- Full Hashing for final confirmation (full file read or memory-mapped).
- Parallelized using Rayon for high performance.
- Progress Bars for visual feedback.
- Supports large datasets and very large files.
- Colored terminal output for better readability.
Usage
1. Install Rust (if you don't have it)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
2. Clone and build the project
git clone https://github.com/yourusername/duplicate-file-finder.git
cd duplicate-file-finder
cargo build --release
3. Run the program
cargo run -- --path /path/to/your/directory
Or using the compiled release binary:
./target/release/duplicate-file-finder --path /path/to/your/directory
Example
cargo run -- --path ./Downloads
Sample output:
Scanning files...
Found 5321 files. Computing partial hashes...
Grouping files by partial hash...
421 candidate files after partial hashing. Computing full hashes...
Grouping by full hash...
❌ Duplicates found:
Group 1 (2 files) - Hash: d2f1d7e91c8b...
/path/to/file1.jpg
/path/to/file1_copy.jpg
Group 2 (3 files) - Hash: a34e1b1fe98d...
/path/to/doc1.pdf
/path/to/backup/doc1.pdf
/path/to/archive/old/doc1.pdf
Found 2 duplicate groups.
Summary: Scanned 5321 files in 1m 12s.
Command-Line Arguments
Argument | Description | Example |
---|---|---|
--path or -p |
Directory to scan recursively | --path ./Documents |
How It Works
- Step 1: Scan all files under the given directory recursively.
- Step 2: Compute a partial hash (first 4KB) of each file.
- Step 3: Group files with identical partial hashes.
- Step 4: Compute full hashes for the candidate groups.
- Step 5: Report groups of true duplicates based on full file content.
This two-step approach makes it very fast even for very large folders.
Dependencies
This project uses:
blake3
for fast cryptographic hashing.clap
for argument parsing.rayon
for parallel processing.indicatif
for progress bars.colored
for colored terminal output.walkdir
for recursive file walking.memmap2
for memory-mapping large files.
Install all dependencies automatically when you run cargo build
.
License
This project is licensed under the MIT License. See LICENSE
for more information.
Dependencies
~7–16MB
~206K SLoC