#compare #directories #hash #cli #command-line-tool

app catfish

catfish is a CLI tool that compares two directories by hashing all files. It reports which files are in the 'right' folder but not in 'left', regardless of how things were moved or renamed. Great for making sure your 'left' folder has all the files from the 'right' one.

2 releases

new 0.1.1 Dec 29, 2024
0.1.0 Dec 29, 2024

#151 in Filesystem

Download history 54/week @ 2024-12-23

54 downloads per month

MIT license

9KB
78 lines

catfish 🥸

Because sometimes files pretend to be something they’re not.
(No matter the name or location, catfish will find out if they’re the same.)

I needed this functionality today and threw this tool together. I decided to share it here, in case someone else finds it useful!

Why “catfish”?

  • cat is a Unix tool.
  • fish … well, we’re fishing for the truth in your file system.
  • A catfish is a sneaky creature, just like files that might be identical under different names or locations.
  • catfish 🥸 unmask duplicates for what they really are.

What does it do?

catfish recursively scans two folders—let’s call them “left” and “right”—and hashes every file (using SHA256).

  • If a file in the right folder has the same content (hash) as any file in the left folder, it won’t be listed.
  • We don’t check the file’s location in the left folder. Any matching hash anywhere in “left” is enough to exclude it.
  • We don’t check for duplicates within the left folder itself—if “left” has duplicates, that’s not our concern.
  • We can optionally ignore duplicates in the right folder, so that only the first occurrence of any given hash in “right” is shown.

Backstory:

Some time ago, I switched from cloud provider X to cloud provider Y. I had both drives fully synced locally (a full copy, not a "lite" sync), so I copied all my files from X to Y and then turned off sync for X. But I forgot to delete the local X folder, and ended up adding new files to it by mistake. When I went to delete it, a simple path comparison with Y wasn't enough because I'd moved and renamed files in Y, which would have caused a lot of false positives. What I really needed was to find out which files in X didn't exist anywhere in Y - so I could copy them over if necessary, and then safely delete X without losing anything important.

Installation

  1. Ensure you have Rust and Cargo installed.
  2. Run:
    cargo install catfish
    
    or clone this repo and:
    git clone https://github.com/samvdst/catfish.git
    cd catfish
    cargo install --path .
    
  3. That’s it! You can now run catfish from anywhere.

Usage

catfish [OPTIONS] <LEFT_FOLDER> <RIGHT_FOLDER>
  • -i, --ignore-duplicates: if there are multiple files with the same hash in RIGHT_FOLDER, only list the first occurrence.

Example

Suppose we have two folders: foo (left) and bar (right). In bar, we have a file that appears twice with identical content.

catfish foo bar
Files in "bar" but not in "foo":
f2d30353acf140ed51b1343368255c1201a7ee898acd60b25e207ff75555e12c bar/example.txt
f2d30353acf140ed51b1343368255c1201a7ee898acd60b25e207ff75555e12c bar/example_dupe.txt
af89f7d49b0c8ded732a9a2b3aff738cd1a3c1cd0d3635742adfee47faa31cba bar/another_file.txt

If we then run:

catfish foo bar --ignore-duplicates
Files in "bar" but not in "foo":
f2d30353acf140ed51b1343368255c1201a7ee898acd60b25e207ff75555e12c bar/example.txt
af89f7d49b0c8ded732a9a2b3aff738cd1a3c1cd0d3635742adfee47faa31cba bar/another_file.txt

Contributing

Ideas, improvements, and pull requests are always welcome. But please note: I can’t guarantee that I’ll have much time to work on this. So if you open a PR, thanks in advance for your patience!

Dependencies

~2.8–9.5MB
~102K SLoC