8 releases
0.0.8 | Dec 13, 2023 |
---|---|
0.0.7 | Nov 22, 2023 |
0.0.4 | Jun 15, 2023 |
0.0.3 | Apr 23, 2023 |
0.0.1 | Feb 28, 2023 |
#239 in Filesystem
105KB
1.5K
SLoC
Duplicate Destroyer
Command line tool that finds duplicate directories and provides their basic handling.
The Pitch
Have you ever backed up a backup folder of a backup folder? Have you then tried to deduplicate the tangled mess with conventional deduplicator only to find that you have to check 20 431 files manually? Then the DuDe is for you! DuDe finds the topmost duplicate folders in your filesystem and allows you to effortlessly get rid of all of your duplicates once and for all (or at least until the next backup...).
(Also this is a small project intended as a learning experience with Rust.)
Installation
From Source
On Linux with Rust 1.64 or higher install by running:
cargo install --features cli duplicate_destroyer
After the installation is finished, there will be dude
binary available.
I have so far tested the installation on Fedora 35+ and on Raspberry Pi OS Bullseye.
On Ubuntu 22.04 LTS
There may be a missing build dependency - cc
. To install the DuDe first run
apt install build-essential
and then build from source
cargo install --features cli duplicate_destroyer
Basic Usage
Warning: The crate is still pretty new and there are some big changes to the API to be expected.
Scan a directory for duplicates
dude --path path/to/some/dir --path path/to/another/dir
Once the directory is scanned DuDe will print the duplicate groups found. E.g.:
Group 1/2
--------------------------------
0. "path/to/some/dir/some_dir/A"
1. "path/to/some/dir/other_dir/B"
--------------------------------
Size: 8kB
-----------
Select action and paths. (Or press Ctrl-C to exit program.)
[O]pen, Open [F]older, [D]elete, ReplaceWith[H]ardlink, ReplaceWith[S]oftlink, [N]othing
To act on the items found type the letter of action and file numbers. E.g.
O 0 1
will open both files.
D 0
will (upon confirmation) delete "path/to/dir/some_dir/A" in our example.
Parallelism
To configure the number of threads used in calculating checksums use the --jobs
flag:
dude --path path/to/some/dir --jobs 3
When using the DuDe with a modern CPU and an external HDD it is usually better to use only one thread (as is the default now), since the program then becomes IO-bound and the parallel access to multiple files from the HDD can reduce the read speed.
Minimum-size
The minimum size of the duplicates returned can be specified with the --minimum-size
argument. Note however, that this will not significantly reduce the computation time, since the DuDe still gets the checksum of all the files that might have duplicates. This is done because even large directories might differ in some small files and by disregarding the small files completely we would run the risk of losing some small but important data.
Hashing Algorithms
DuDe can use these hashing algorithms for comparing files:
- blake2 [default]
- sha3-256
- sha3-512 If the DuDe is running on memory-constrained system it is recommended to switch to sha3-256 algorithm:
dude --path path/to/some/dir --algorithm "sha2-512"
CLI options
Usage: dude [OPTIONS] --path <PATH>
Options:
-p, --path <PATH> Add path to be scanned
-m, --minimum-size <MINIMUM_SIZE> Minimum size of duplicates considered (can have a metric prefix) [default=100]
-j, --jobs <JOBS> Number of jobs that run simultaneously [default=0]
--json-file <FILE> Output the list of duplicates to a file in json format
--no-interactive Disable interactive duplicate handling
-a, --algorithm <ALGORITHM> Hash algorithm used to compare files [possible values: blake2, sha3-256, sha3-512]
-h, --help Print help
-V, --version Print version
The Library
If you do not like the user interface, you can write your own! The DuDe exposes a library with the core functionality. See the documentation here.
Dependencies
~5–14MB
~165K SLoC