#duplicate #directory #cleaner #finder #cli

bin+lib duplicate_destroyer

Finds and annihilates duplicate directories

8 releases

0.0.8 Dec 13, 2023
0.0.7 Nov 22, 2023
0.0.4 Jun 15, 2023
0.0.3 Apr 23, 2023
0.0.1 Feb 28, 2023

#247 in Filesystem

Download history 4/week @ 2023-11-08 10/week @ 2023-11-15 32/week @ 2023-11-22 27/week @ 2023-11-29 2/week @ 2023-12-06 38/week @ 2023-12-13 8/week @ 2023-12-20 9/week @ 2023-12-27 12/week @ 2024-01-03 1/week @ 2024-01-10 6/week @ 2024-01-17 13/week @ 2024-01-24 17/week @ 2024-01-31 1/week @ 2024-02-07 96/week @ 2024-02-14 250/week @ 2024-02-21

376 downloads per month

AGPL-3.0-or-later

105KB
1.5K SLoC

Duplicate Destroyer

Command line tool that finds duplicate directories and provides their basic handling.

Tests crates.io docs.rs

The Pitch

Have you ever backed up a backup folder of a backup folder? Have you then tried to deduplicate the tangled mess with conventional deduplicator only to find that you have to check 20 431 files manually? Then the DuDe is for you! DuDe finds the topmost duplicate folders in your filesystem and allows you to effortlessly get rid of all of your duplicates once and for all (or at least until the next backup...).

(Also this is a small project intended as a learning experience with Rust.)

Installation

From Source

On Linux with Rust 1.64 or higher install by running:

cargo install --features cli duplicate_destroyer

After the installation is finished, there will be dude binary available.

I have so far tested the installation on Fedora 35+ and on Raspberry Pi OS Bullseye.

On Ubuntu 22.04 LTS

There may be a missing build dependency - cc. To install the DuDe first run

apt install build-essential

and then build from source

cargo install --features cli duplicate_destroyer

Basic Usage

Warning: The crate is still pretty new and there are some big changes to the API to be expected.

Scan a directory for duplicates

dude --path path/to/some/dir --path path/to/another/dir

Once the directory is scanned DuDe will print the duplicate groups found. E.g.:

Group 1/2
--------------------------------
0. "path/to/some/dir/some_dir/A"
1. "path/to/some/dir/other_dir/B"
--------------------------------
Size: 8kB
-----------
Select action and paths. (Or press Ctrl-C to exit program.)
[O]pen, Open [F]older, [D]elete, ReplaceWith[H]ardlink, ReplaceWith[S]oftlink, [N]othing

To act on the items found type the letter of action and file numbers. E.g.

O 0 1

will open both files.

D 0

will (upon confirmation) delete "path/to/dir/some_dir/A" in our example.

Parallelism

To configure the number of threads used in calculating checksums use the --jobs flag:

dude --path path/to/some/dir --jobs 3

When using the DuDe with a modern CPU and an external HDD it is usually better to use only one thread (as is the default now), since the program then becomes IO-bound and the parallel access to multiple files from the HDD can reduce the read speed.

Minimum-size

The minimum size of the duplicates returned can be specified with the --minimum-size argument. Note however, that this will not significantly reduce the computation time, since the DuDe still gets the checksum of all the files that might have duplicates. This is done because even large directories might differ in some small files and by disregarding the small files completely we would run the risk of losing some small but important data.

Hashing Algorithms

DuDe can use these hashing algorithms for comparing files:

  • blake2 [default]
  • sha3-256
  • sha3-512 If the DuDe is running on memory-constrained system it is recommended to switch to sha3-256 algorithm:
dude --path path/to/some/dir --algorithm "sha2-512"

CLI options

Usage: dude [OPTIONS] --path <PATH>

Options:
  -p, --path <PATH>                  Add path to be scanned
  -m, --minimum-size <MINIMUM_SIZE>  Minimum size of duplicates considered (can have a metric prefix) [default=100]
  -j, --jobs <JOBS>                  Number of jobs that run simultaneously [default=0]
      --json-file <FILE>             Output the list of duplicates to a file in json format
      --no-interactive               Disable interactive duplicate handling
  -a, --algorithm <ALGORITHM>        Hash algorithm used to compare files [possible values: blake2, sha3-256, sha3-512]
  -h, --help                         Print help
  -V, --version                      Print version

The Library

If you do not like the user interface, you can write your own! The DuDe exposes a library with the core functionality. See the documentation here.

Dependencies

~5–17MB
~173K SLoC