#disk #nfs #tool #file

app du2

Fast parallel file system lister / usage statistics summary

1 unstable release

0.2.1 Jun 28, 2020
0.2.0 Jun 28, 2020

#4 in #nfs

Unlicense OR MIT

50KB
1K SLoC

du2 0.2.0
Fast parallel file system lister / usage statistics summary

Latency vs throughput: The theory here is that parallel listing overcomes latency issues on remote files systems by
having multiple requests in play at once.  Usually remote file systems capable of good throughput will have higher
latency than local file systems largely because the OS owns a faster and exclusive cache to local file system metadata.

Each "opendir" is finished to completion so that the directory to minimize open time, but this costs more memory than
straight recursion. This might also contribute to better performance as it may reduce contention on that remote file
system versus holding the opendir open as you recurse a directory's children.  Sub directories found are queued for
other threads to query to completion, and therefore because the number of directories may be large the queue grows
unbounded.  The queue must be unbounded or a deadlock can occur as the worker is also a master (creator of new work).

Because in this application directories are evaluated in no particular order, it is necessary to aggregate lower
directories up the tree containing ALL directories for usage summaries. This tree is the bulk of the memory used and is
proportional to the tree directory count.

Symbolic links are not followed

USAGE:
    du2.exe [OPTIONS] <DIRECTORY>

OPTIONS:
    -d, --delimiter <delimiter>                Disk usage mode - do not write the files found [default: |]
        --exclude-re <exclude-re>              Exclude FILEs that match this RE
        --file-newer-than <file-newer-than>    Only count/sum entries newer than this age
        --file-older-than <file-older-than>    Only count/sum entries older than this age
    -h, --help                                 Prints help information
    -l                                         Write file list
        --extra                                
    -t, --worker-threads <no-threads>          Number worker threads [default: 0]
        --progress                             Writes progress stats on every ticker interval
        --re <re>                              Keep only FILEs that match this RE
        --write_thread_status                  Writes thread status every ticker interval - used to debug things
        --t_status_on_key                      Writes thread status when stdin sees a line entered by user
    -i, --ticker-interval <ticker-interval>    Interval at which stats are written - 0 means no ticker is run [default:
                                               200]
    -n <top-n-limit>                           Report top usage limit [default: 10]
    -u, --usage-trees                          Write disk usage summary
    -V, --version                              Prints version information
    -v                                         Verbosity - use more than one v for greater detail
        --write_thread_cpu_time                write cpu time consumed by each thread

ARGS:
    <DIRECTORY>    Directory to search

Dependencies

~6MB
~113K SLoC