1 unstable release

new 0.5.1 Dec 16, 2024

#997 in Command line utilities

MIT license

26KB
358 lines

minifind

GitHub license GitHub release Rust Report Card release

About

minifind is a barebones Un*x find tool implementation in Rust, meant just to list directory entries as fast as possible and little else. For filename or path matching, it is possible to use --name or --regex options, toggling case insensitivity with --case-insensitive or not. Additionally to narrow down matches, it is possible to use --file-type option and filter by file type (b for block device, c for character device, d for directory, p for named FIFO, f for file, l for symlink, s for socket or e for empty file/directory).

It will not follow filesystem symlinks and it will not cross filesystem boundaries by default. Number of threads used is set to the number of available CPU cores in the system.

Let us also mention other notable projects dealing with this task:

Usage

Usage: minifind [OPTIONS] <PATH>...

Arguments:
  <PATH>...  Paths to check for large directories

Options:
  -f, --follow-symlinks <FOLLOW_SYMLINKS>    Follow symlinks [default: false] [short aliases: L] [possible values: true, false]
  -o, --one-filesystem <ONE_FILESYSTEM>      Do not cross mount points [default: true] [aliases: xdev] [possible values: true, false]
  -x, --threads <THREADS>                    Number of threads to use when calibrating and scanning [default: 20]
  -d, --max-depth <MAX_DEPTH>                Maximum depth to traverse
  -n, --name <NAME>                          Base of the file name matching globbing pattern
  -r, --regex <REGEX>                        File name (full path) matching regular expression pattern
  -i, --case-insensitive <CASE_INSENSITIVE>  Case-insensitive matching for globbing and regular expression patterns [default: false] [possible values: true, false]
  -t, --file-type <FILE_TYPE>                Filter matches by type. Also accepts 'b', 'c', 'd', 'p', 'f', 'l', 's' and 'e' aliases [default: directory file symlink]
                                             [possible values: empty, block-device, char-device, directory, pipe, file, symlink, socket]
  -h, --help                                 Print help
  -V, --version                              Print version

Regular expressions

--regex option uses Rust regex syntax that is very similar to other engines but without support for look-around and backreferences.

Glob expressions

--name option uses Unix-style glob syntax.

Minifind vs GNU find

Hardware: 8-core Xeon E5-1630 with 4-drive SATA RAID-10

Benchmark setup:

$ cat bench1.sh
#!/bin/dash
exec /usr/bin/find / -xdev

$ cat bench2.sh
#!/bin/dash
exec /usr/local/sbin/minifind /
Benchmark 1: ./bench1.sh
  Time (mean ± σ):      4.655 s ±  0.160 s    [User: 1.287 s, System: 3.366 s]
  Range (min … max):    4.525 s …  5.016 s    10 runs

Benchmark 2: ./bench2.sh
  Time (mean ± σ):      1.244 s ±  0.020 s    [User: 3.921 s, System: 5.908 s]
  Range (min … max):    1.199 s …  1.271 s    10 runs

Summary
  ./bench2.sh ran
    3.74 ± 0.14 times faster than ./bench1.sh

Dependencies

~9–19MB
~282K SLoC