#directory #disk #drive #output #summary #performance #csv

app dirscan

A high performance tool for summarizing large directories or drives

10 releases (6 stable)

1.4.1 Sep 6, 2022
1.4.0 Mar 29, 2021
0.3.0 Apr 12, 2020
0.2.0 Apr 6, 2020
0.1.1 Apr 4, 2020

#675 in Filesystem

43 downloads per month

MIT license

740KB
712 lines

Dirscan Crates.io Actions Status

Dirscan is a high-performance tool for quickly inspecting the contents of huge (possibly networked) disks. It provides a summary of every single directory on a given disk, complete with the number of files within, their total size, and the latest time a file was created, accessed or modified.

It's designed for disks that are too large to inspect with traditional tools, and it:

  • Is many orders of magnitudes faster than tools like du, find or tree
  • Can max out any disk you give it, assuming you have enough CPU resources to keep up.
  • Produces a simple JSON or CSV output that can be analysed by the built in viewer or other tools
  • Supports a customisable number of threads
  • Streams results to the output file, keeping relatively constant memory usage with any sized disk.

Table of Contents

Install šŸ’æ

Homebrew (MacOS + Linux)

brew tap orf/brew, then brew install dirscan

Binaries (Windows)

Download the latest release from the github releases page. Extract it and move it to a directory on your PATH.

Cargo

For optimal performance run cargo install dirscan

Docker

This project is packaged as a Docker container as tomforbes/dirscan.

Running docker run -vYOUR_DIRECTORY:/dir tomforbes/dirscan scan /dir will scan YOUR_DIRECTORY.

Usage šŸŽ·

Scan a directory

You can start scanning a directory by executing:

dirscan scan [PATH] --output=[OUTPUT]

This will scan [PATH] and output all results, in JSON format, to [OUTPUT]. By default it will use a thread pool with 2 * number_of_cores threads, but you can customize this. Depending on your disk speed the number of threads can drastically improve performance:

dirscan scan [PATH] --output=[OUTPUT] --threads=20

You can also output the results in CSV:

dirscan scan [PATH] --output=[OUTPUT] --format=csv

$ dirscan scan ~/ --output=output.json --threads=20
[00:00:15] Files/s: 17324/s | Total: 258734 | Size: 99.01GB | Components: 14291 | Errors: IO=0 Other=36

Stream results

You can stream all files to stdout by executing:

dirscan stream [PATH]

If you wanted to remove all files in a disk in parallel, you could create a pipeline like:

dirscan stream /my-dir | xargs -d ā€˜\nā€™ -L10 -P500

This would launch up to 500 rm processes, each deleting 10 files.

Inspect results

Once a scan is complete you can inspect the output using:

dirscan parse [OUTPUT]

For example:

$ dirscan parse output.json --prefix=/System/
[00:00:02] Total: 580000 | Per sec: 220653/s
+----------------------+---------+----------+-------------+-------------+-------------+
| Prefix               | Files   | Size     | created     | accessed    | modified    |
+----------------------+---------+----------+-------------+-------------+-------------+
| /System/Applications | 57304   | 777.28MB | 2 weeks ago | 2 weeks ago | 2 weeks ago |
| /System/DriverKit    | 55      | 5.09MB   | 2 weeks ago | 2 weeks ago | 2 weeks ago |
| /System/Library      | 292190  | 13.56GB  | 7 hours ago | 1 hour ago  | 7 hours ago |
| /System/Volumes      | 1468296 | 197.93GB | 1 hour ago  | 1 hour ago  | 1 hour ago  |
| /System/iOSSupport   | 13856   | 600.20MB | 2 weeks ago | 2 weeks ago | 2 weeks ago |
+----------------------+---------+----------+-------------+-------------+-------------+

You can include more directories with the --depth flag, or change the prefix search with --prefix.

You can also order the results by name (the default), size or files:

$ dirscan parse output.json --prefix=/System/ --sort=size
[00:00:02] Total: 580000 | Per sec: 220653/s
+----------------------+---------+----------+-------------+-------------+-------------+
| Prefix               | Files   | Size     | created     | accessed    | modified    |
+----------------------+---------+----------+-------------+-------------+-------------+
| /System/Volumes      | 1468296 | 197.93GB | 2 hours ago | 2 hours ago | 2 hours ago |
| /System/Library      | 292190  | 13.56GB  | 7 hours ago | 2 hours ago | 7 hours ago |
| /System/Applications | 57304   | 777.28MB | 2 weeks ago | 2 weeks ago | 2 weeks ago |
| /System/iOSSupport   | 13856   | 600.20MB | 2 weeks ago | 2 weeks ago | 2 weeks ago |
| /System/DriverKit    | 55      | 5.09MB   | 2 weeks ago | 2 weeks ago | 2 weeks ago |
+----------------------+---------+----------+-------------+-------------+-------------+

Dependencies

~11ā€“21MB
~284K SLoC