2 releases
new 0.1.1 | Dec 29, 2024 |
---|---|
0.1.0 | Dec 29, 2024 |
#151 in Filesystem
54 downloads per month
9KB
78 lines
catfish 🥸
Because sometimes files pretend to be something they’re not.
(No matter the name or location, catfish will find out if they’re the same.)
I needed this functionality today and threw this tool together. I decided to share it here, in case someone else finds it useful!
Why “catfish”?
- cat is a Unix tool.
- fish … well, we’re fishing for the truth in your file system.
- A catfish is a sneaky creature, just like files that might be identical under different names or locations.
- catfish 🥸 unmask duplicates for what they really are.
What does it do?
catfish
recursively scans two folders—let’s call them “left” and “right”—and hashes every file (using SHA256).
- If a file in the right folder has the same content (hash) as any file in the left folder, it won’t be listed.
- We don’t check the file’s location in the left folder. Any matching hash anywhere in “left” is enough to exclude it.
- We don’t check for duplicates within the left folder itself—if “left” has duplicates, that’s not our concern.
- We can optionally ignore duplicates in the right folder, so that only the first occurrence of any given hash in “right” is shown.
Backstory:
Some time ago, I switched from cloud provider X to cloud provider Y. I had both drives fully synced locally (a full copy, not a "lite" sync), so I copied all my files from X to Y and then turned off sync for X. But I forgot to delete the local X folder, and ended up adding new files to it by mistake. When I went to delete it, a simple path comparison with Y wasn't enough because I'd moved and renamed files in Y, which would have caused a lot of false positives. What I really needed was to find out which files in X didn't exist anywhere in Y - so I could copy them over if necessary, and then safely delete X without losing anything important.
Installation
- Ensure you have Rust and Cargo installed.
- Run:
or clone this repo and:cargo install catfish
git clone https://github.com/samvdst/catfish.git cd catfish cargo install --path .
- That’s it! You can now run
catfish
from anywhere.
Usage
catfish [OPTIONS] <LEFT_FOLDER> <RIGHT_FOLDER>
-i, --ignore-duplicates
: if there are multiple files with the same hash inRIGHT_FOLDER
, only list the first occurrence.
Example
Suppose we have two folders: foo (left) and bar (right). In bar, we have a file that appears twice with identical content.
catfish foo bar
Files in "bar" but not in "foo":
f2d30353acf140ed51b1343368255c1201a7ee898acd60b25e207ff75555e12c bar/example.txt
f2d30353acf140ed51b1343368255c1201a7ee898acd60b25e207ff75555e12c bar/example_dupe.txt
af89f7d49b0c8ded732a9a2b3aff738cd1a3c1cd0d3635742adfee47faa31cba bar/another_file.txt
If we then run:
catfish foo bar --ignore-duplicates
Files in "bar" but not in "foo":
f2d30353acf140ed51b1343368255c1201a7ee898acd60b25e207ff75555e12c bar/example.txt
af89f7d49b0c8ded732a9a2b3aff738cd1a3c1cd0d3635742adfee47faa31cba bar/another_file.txt
Contributing
Ideas, improvements, and pull requests are always welcome. But please note: I can’t guarantee that I’ll have much time to work on this. So if you open a PR, thanks in advance for your patience!
Dependencies
~2.8–9.5MB
~102K SLoC