1 unstable release
Uses new Rust 2024
| 0.1.0 | Sep 26, 2025 |
|---|
#773 in Filesystem
58 downloads per month
43KB
899 lines
Fast duplicate detection and remediation
Usage
rsdupes -i <extension-to-include> -e <extension-to-exclude> -m <minimum-size> <Directory>
Detecting Duplicates Without Action
To write a listing of duplicates, this is useful for figuring out what might be duplicated without doing anything risky.
rsdupes -j duplicates.json ~
Playing with Water on Steam
I have a lot of mods on some games, and did you know that some of them use duplicate assets? For example, for Rimworld, this should help slightly.
rsdupes -i jpg -i png -i xml -i dds -h ~/.local/share/Steam
Playing with Fire on Steam
For example, steam tends to install a lot of copies of mono for various games, this can consume several megabytes, and can be hard-linked for minor reduction of disk usage.
This may have deleterious effects should the game's copy of Unity move to a newer version. This will also deduplicate the various Visual C++ redistributables, which might be... less hazardous.
# Don't run this probably
rsdupes -i exe -i dll -i xml -h ~/.local/share/Steam
Media File deduplication
You may have a lot of media stored, I sure do. This might be helpful, and it should be safe.
rsdupes -i png -i jpg -i gif -i webp -i webm -i mp4 -i mkv -h <MEDIA-PATH>
Current Functionality
Current functionality:
- Invoke on any number of directories
- Filters files based on size into buckets
- Filters those buckets into partial hashes(SHA256)
- Filters those partial hashed buckets into completely hashed buckets
- Filters those fully hashed files and compares them byte by byte, merging duplicates. into buckets. This is also where permissions are compared. Currently, permissions checks are only effective on unixes, but likely incomplete there.
- (Optionally) links duplicates together
- (Optionally) writes a json file
Current JSON Structure
{
"<file_size>": {
"<HASH>": [
"PATH":[
"DUPLICATE",
"DUPLICATE2"
],...
],
...
}
}
Current Actor Model
┌────────────────────────────┐
│ Filesystem recursion actor │
└─────────────┬──────────────┘ ┌───────────────────┐
├───────────────────┤FileExtensionFilter│
│ (File Paths) └───────┬───────────┘
│ │
┌─────────┴────────┐ │
│Inode Deduplicator├──────────────────┘
└─────────┬────────┘
│ (File Paths)
┌─────────┴────────────┐
│ Size Duplicate buffer│
└─────────┬────────────┘
│ (File Paths)
┌────────┴───────┐
│ Partial hasher ├───────────────────────┐
└────────┬───────┘ │
│ ┌────────┴─────────┐
│ (Filesize, │StatusDisplayActor│
│ PartialHash, └┬──┬─┬─┬──────────┘
│ Path) ↑ ↑ │ │
┌─────────────┴───────────────┐ │ │ │ │
│Partial Hash duplicate buffer│ │ │ │ │
└─────────────┬───────────────┘ │ │ │ │
│ │ │ │ │
┌─────┴─────┐ │ │ │ │
│Full Hasher├─────────────────┘ │ │ │
└─────┬─────┘ │ │ │
│ (Size, Hash, Path) │ │ │
┌───────┴────────┐ │ │ │
│Full Hash Filter├─────────────────┘ │ │
└───────┬────────┘ │ │
│ (Size, Hash, Path) │ │
├ │ │
│ │ │
┌─────┴─────────────────┐ │ │
│Byte-by-byte comparison├──────────┘ │
└───┬───────────────────┘ │
│ │
├────────┐ │
│ │ │
┌─────┴────┐ │ │
│HardLinker├───│───────────────────────┘
└────┬─────┘ │
├─────────┘
┌────┴─────┐
│JsonDumper│
└──────────┘
Dependencies
~14–30MB
~424K SLoC