5 releases
| 0.1.5 | Jul 11, 2025 |
|---|---|
| 0.1.4 | Jul 9, 2025 |
| 0.1.3 | Jul 8, 2025 |
| 0.1.2 | Jul 7, 2025 |
| 0.1.0 | Jul 7, 2025 |
#857 in Filesystem
221 downloads per month
2MB
422 lines
Duplicate File Finder
A fast, parallelized CLI tool and library for detecting duplicate files by content. Designed for efficiency, usability, and cross-platform compatibility.
Features
- Recursively scans directories for duplicate files
- Detects duplicates using a multi-stage strategy:
- Group by file size
- Compare quick hash (first 8 KB using
twox-hash) - Validate full content with SHA-256
- Generates detailed reports with metadata and potential space savings
- Supports progress indicators and structured logging
- Multithreaded using
rayonfor high performance - Usable as both a CLI tool and a Rust library
Installation
Add to your project:
[dependencies]
duplicate_file_finder = "0.1"
Or install the CLI binary:
cargo install duplicate_file_finder
Usage
Command Line
duplicate_file_finder [--output <file_or_directory>]
duplicate_file_finder <directory> [--output <file_or_directory>]
duplicate_file_finder --directories <dir1> <dir2> ... [--output <file_or_directory>]
Example
duplicate_file_finder ~/Documents --output reports/
This scans ~/Documents and writes a human-readable report to reports/duplicate_file_report.txt.
Running duplicate_file_finder with no arguments scans the directory it is executed from and saves duplicate_file_report.txt in that same directory.
Options
| Option | Description |
|---|---|
-h, --help |
Show help message |
--output <path> |
Specify output file or directory for the report |
-d, --directories <DIR> |
Scan multiple directories as a single pool |
If the output path is a directory, the report is saved as duplicate_file_report.txt within that directory.
Sample Output
Duplicate File Finder Report
Generated by: alice
Start Time: 20250707 15:00:00
End Time: 20250707 15:00:42
Base Directory: /home/alice/Documents
Total Potential Space Savings: 1.43 GB
Size: 143.21 MB
/home/alice/Documents/archive/copy1.iso
/home/alice/Documents/archive/copy2.iso
Library Usage
You can also integrate the crate into your own Rust projects:
use duplicate_file_finder::{find_duplicates, write_output, setup_logger};
use std::path::Path;
fn main() -> Result<(), Box<dyn std::error::Error>> {
setup_logger()?;
let base_dir = Path::new("/some/path");
let duplicates = find_duplicates(base_dir);
write_output(duplicates, "report.txt", "20250707 15:00:00", &[base_dir.to_path_buf()])?;
Ok(())
}
Logging
Logs are written to duplicate_finder.log and include timestamps and severity levels.
Platform Support
- Linux
- macOS
- Windows
Performance
The tool is optimized for performance using:
- Parallel iteration via
rayon - Incremental filtering (size → quick hash → full hash)
- Efficient I/O with buffered reading
Development
Running Tests
cargo test
Building the Binary
cargo build --release
License
This project is licensed under the MIT License. See LICENSE for details.
Contributing
Contributions, issues, and feature requests are welcome!
- Fork the repository
- Create a new branch (
git checkout -b feature/awesome) - Commit your changes (
git commit -am 'Add awesome feature') - Push to the branch (
git push origin feature/awesome) - Create a new Pull Request
Made with ❤️ by Andrew Sims
Dependencies
~7–10MB
~186K SLoC