#glob-pattern #pattern #glob #regex #pattern-match #directory-tree

app nym-cli

Manipulate files en masse using patterns

1 unstable release

0.1.0 May 19, 2021

#1563 in Filesystem

MIT license

135KB
3.5K SLoC

Nym

Nym is a cross-platform library and command line interface for manipulating files using patterns. It is inspired by and very loosely based upon mmv.

GitHub docs.rs crates.io

Usage

Nym commands are formed from flags, options, and patterns. Most commands are transforms composed of both a from-pattern to match source files and a to-pattern to resolve destination paths. Transforms include the append, copy, link and move commands. Some commands, such as find, use only a from-pattern.

Commands that write to the file system (i.e., transforms like copy) are interactive by default and print a manifest and then prompt to continue before writing. This behavior can be controlled with the --interactive option.

Nym operates exclusively on files (with the exception of the --parent/-p flag, which creates parent directories in destination paths derived from to-patterns). Commands never apply to directories. It is not possible to copy, link, or move directories, for example.

The following command copies all files in the working directory tree to a neighboring file with an appended .bak extension.

nym copy '**' '{#0}.bak'

Here, copy is the sub-command (also known as an actuator), ** is the from-pattern, and {#0}.bak is the to-pattern. Note that in most shells patterns must be escaped to avoid interacting with shell features like expansion. Quoting patterns usually prevents these unwanted interactions.

The following command finds all files beneath a src directory with either the .go or .rs extension.

nym find '**/src/**/*.{go,rs}'

From-Patterns

From-patterns match source files to actuate using Unix-like globs. These globs resemble literal paths, but additionally support wildcards, character classes, and alternatives that can be matched against paths on the file system. Matches provide capture text that can be used in to-patterns.

Globs are opinionated about path separators. Forward slash / is always the path separator and back slashes \ are forbidden (back slash is used for escape sequences, but the literal sequence \\ is not supported). Separators are normalized across platforms; glob patterns can match paths on Windows, for example.

Wildcards

Wildcards match some amount of arbitrary text in paths and are the most fundamental tool provided by globs.

The tree wildcard ** matches zero or more sub-directories. This is the only way to match against arbitrary directories; all other wildcards do not match across directory boundaries. When a tree wildcard participates in a match and does not terminate the pattern, its capture includes a trailing path separator. If a tree wildcard does not participate in a match, its capture is an empty string with no path separator. Tree wildcards must be delimited by path separators or a termination (such as the beginning and/or end of a glob or sub-glob). If a glob consists solely of a tree wildcard, then it matches all files in the working directory tree.

The zero-or-more wildcards * and $ match zero or more of any character except path separators. Zero-or-more wildcards cannot be adjacent to other zero-or-more wildcards. The * wildcard is eager and will match the longest possible text while the $ wildcard is lazy and will match the shortest possible text. When followed by a literal, * stops at the last occurrence of that literal while $ stops at the first occurence.

The exactly-one wildcard ? matches any single character except path separators. Exactly-one wildcards do not group automatically, so a pattern of contiguous wildcards such as ??? form distinct captures for each ? wildcard. An alternative can be used to group exactly-one wildcards into a single capture, such as {???} (see below).

Character Classes

Character classes match any single character from a group of literals and ranges except path separators. Classes are delimited by square brackets [...]. Individual character literals are specified as is, such as [ab] to match either a or b. Character ranges are formed from two characters separated by a hyphen, such as [x-z] to match x, y, or z.

Any number of character literals and ranges can be used within a single character class. For example, [qa-cX-Z] matches any of q, a, b, c, X, Y, or Z.

Character classes may be negated by including an exclamation mark ! at the beginning of the class pattern. For example, [!a] matches any character except for a.

Note that character classes can also be used to escape metacharacters like *, $, etc., though globs also support escaping via a backslash \. To match the control characters [, ], and - within a character class, they must be escaped via a backslash, such as [a\-] to match a or -.

Alternatives

Alternatives match an arbitrary sequence of comma separated sub-globs delimited by curly braces {...,...}. For example, {a?c,x?z,foo} matches any of the sub-globs a?c, x?z, or foo in order. Alternatives may be arbitrarily nested, such as in a{b,c{d,e{f,g}}}.

Alternatives form a single capture group regardless of the contents of their sub-globs. This capture is formed from the complete match of the sub-glob, so if the sub-glob a?c matches abc in {a?c,x?z}, then the capture text will be abc (not b as it would be outside of an alternative sequence). Alternatives can be used to group capture text using a single sub-glob, such as {*.{go,rs}} to capture an entire file name with a particular extension or {??} to group a sequence of exactly-one wildcards.

Sub-globs, in particular those containing wildcards, must consider neighboring patterns. For example, it is not possible to introduce a tree wildcard that is adjacent to anything but a path separator or termination, so foo{bar,baz/**} is allowed but foo{bar,**/baz} is not. Because it matches anything, singular tree wildcards are never allowed in sub-globs, so foo/{bar,**} is disallowed.

Literals and Platform-specific Features

Any components not recognized by globs are interpreted as literals. In combination with strict interpretations of path separators, this means some platform-specific features cannot be used as part of a from-pattern.

In particular, while from-patterns can be rooted, they cannot include schemes nor Windows path prefixes. On Windows, UNC paths or paths with other prefixes can be used via the --tree/-C option, which establishes the directory in which from-patterns are applied using native paths. For example, the following command copies all files from the UNC share path \\server\share\src.

nym copy -p --tree=\\server\share 'src/**' 'C:\\backup\\{#1}'

Globs do not explicitly support the notion of a parent directory. However, any invariant (literal) prefix is re-interpreted by the platform as a native path, so from-patterns that begin with .. behave as expected on Unix and Windows. For example, the following command intuitively operates in the parent of the current working directory.

nym find '../src/*.rs'

However, .. is interpreted as a literal and when it follows variant (non-literal) components in a glob it only matches paths with the literal component ... This never occurs when traversing directory trees, so .. literals following variant patterns like wildcards match nothing and should not be used. For example, the from-pattern src/**/../*.rs never yields any matching files.

To-Patterns

To-patterns resolve destination paths. These patterns consist of literals and substitutions. A substitution is either a capture from a corresponding from-pattern or a property that reads file metadata. Substitutions are delimited by curly braces {...}. Literals form a native path as-is.

Captures

Captures index a from-pattern using a hash followed by the index, like {#1}. These indices count from one; the zero index is used for the full text of a match. Empty braces also represent the full text of a match, so {#0} and {} are equivalent.

Captures may include a condition. Conditions specify substitution text based on whether or not the match text is empty. Conditions follow capture identifiers using a ternary-like syntax: they begin with a question mark ? followed by the non-empty case, a colon :, and finally the empty case. Each case supports literals, which specify alternative text delimited by square brackets [...]. In the non-empty case, a surrounding prefix and postfix can be used instead using two comma separated literals [...],[...]. Condition cases and substitution text may be empty.

For example, {#1?[],[-]:} is replaced by the matching text of the first capture and, when that text is non-empty, is followed by the postfix -. {#1?:[unknown]} is replaced by the matching text of the first capture and, when that text is empty, is replaced by the literal unknown.

Properties

Properties include source file metadata in the destination path and are specified by name following an exclamation mark !. Property names are case insensitive. Supported properties are described in the following table.

Pattern Metadata Type / Format Cargo Feature
{!b3sum} BLAKE3 hash digest digest property-b3sum (default)
{!ctime} creation timestamp date-time n/a
{!md5sum} MD5 hash digest digest property-md5sum (default)
{!mtime} modification timestamp date-time n/a

For example, {!b3sum} is replaced by the BLAKE3 hash digest of the matched file.

Properties are associated with a data type and corresponding format that transforms them into the output text of a substitution. Formats are optionally specified after a property name following a colon : and delimited by square brackets [...]. For example, the date-time data type uses a strftime-like format and the pattern {!mtime:[%Y]} outputs the text of the four-digit year of a source file's modification timestamp.

Properties may require additional dependencies and some can be toggled in a build using Cargo features.

Text Formatters

Substitutions (both captures and properties) support optional text formatters. Text formatters must appear last in a substitution following a vertical bar |. Any number of text formatters may be used separated by commas , and they are applied from left to right in the order in which they appear.

Text formatters are distinct from property formats and, as their name suggests, operate exclusively on the output text of a substitution (they do not operate on non-textual data).

The pad formatter pads substitution text to a specified width and alignment using the given character shim. For example, {#1|>4[0]} pads the substitution text into four columns using right alignment and the character 0 for padding. If the original substitution text is 13, then it becomes 0013 after formatting in this example. Left and center alignment are also supported via < and ^, respectively.

There are three casing formatters: lowercase, uppercase, and titlecase, with the case-insensitive patterns lower, upper, and title, respectively. These formatters take no parameters and change the casing of supported characters. Note that title is sensitive to word breaks, which only occur across whitespace and hyphens - (and not underscores _, for example).

The coalesce formatter replaces matching input characters with an output character. For example, {#1|%[_-][~]} replaces any instances of _ or - with a tilde ~.

Text formatters can be combined to perform complex formatting. For example, the following command extracts a part of file names delimited by underscores _ and formats that part using title casing with spaces.

nym move '$_$_*.mp4' '{#2|%[-][ ],title}.mp4'

Given a file named the-show-title_the-episode-title_the-encoding.mp4, the above transform would move it to The Episode Title.mp4.

Crates

Nym's core functionality is exposed as an independent library and front ends are developed atop this library. The following table describes the official Rust crates maintained in the Nym repository.

Crate Description
nym Library implementing Nym's core functionality.
nym-cli Binary for the nym command line interface (CLI).

The major and minor versions of these crates are upgraded together.

Installation

The nym binary can be installed in various ways described below.

Repository

To install nym from a clone of the repository, install Rust and then build and install nym using cargo.

git clone https://github.com/olson-sean-k/nym.git
cd nym/nym-cli
cargo install --locked --path=. --force

By default, this will build the master branch, which generally tracks tested upcoming changes. To install a specific release, checkout a version tag before using cargo install. Note that the build instructions may differ between versions; refer to the README in the clone.

git checkout v0.0.0

Registry

To install a release of nym from the crates.io Rust package registry, install Rust and then build and install nym using cargo.

cargo install nym-cli --locked --force

To install a specific release, use the --version option.

cargo install nym-cli --version=0.0.0 --locked --force

Package

The following table describes packages that can be used to install nym.

Platform Package
Arch (AUR) nym-git

Disclaimer

Nym is provided as is and with no warranty. At the time of writing, Nym is experimental and likely has bugs. Data loss may occur. Use at your own risk.

Dependencies

~10–22MB
~304K SLoC