#unicode #text

bin+lib unic-cli

UNIC Command-Line Tools

3 releases (breaking)

0.9.0 Mar 3, 2019
0.8.0 Jan 2, 2019
0.7.0 Feb 7, 2018

#389 in Internationalization (i18n)

26 downloads per month

MIT/Apache

38KB
493 lines

UNIC Command-Line Tools

Crates.io

This package provides command-line tools for developers, helping with common Unicode tasks, like generating, transforming, and inspecting strings.

How to Install

Install the unic-cli package using Cargo.

$ cargo install unic-cli

How to Use It

Echo Command

The unic-echo command generates Unicode strings, similar to the common echo command.

# Normally, the output ends with a newline
$ unic-echo Hello سلام
Hello سلام

# Get the codepoints or UTF-8/UTF-16, instead of plain text
$ unic-echo Hello سلام -o codepoints
U+0048 U+0065 U+006C U+006C U+006F U+0020 U+0633 U+0644 U+0627 U+0645

# Or see how to escape as a Rust string literal
$ unic-echo Hello سلام -o rust-escape
"Hello \u{633}\u{644}\u{627}\u{645}"

# Also the input can be in codepoints, or UTF-8/UTF-16 hex
$ unic-echo -i codepoints U+0 U+20 U+41 U+A0 -o rust-escape
"\u{0} A\u{a0}"

$ unic-echo --help
Write arguments to the standard output

USAGE:
    unic-echo [FLAGS] [OPTIONS] [STRINGS]...

FLAGS:
    -h, --help          Prints help information
    -n, --no-newline    No trailing newline
    -V, --version       Prints version information

OPTIONS:
    -i, --input <FORMAT>     Specify input format (see list below)
    -o, --output <FORMAT>    Specify output format (see list below)

ARGS:
    <STRINGS>...    Input strings (expected valid Unicode)

INPUT FORMATS:
    plain                   Plain Unicode characters (default)
    codepoints              Unicode codepoints (hex)
    utf8-hex                UTF-8 bytes (hex)
    utf16-hex               UTF-16 words (hex)

OUTPUT FORMATS:
    plain                   Plain Unicode characters (default)
    codepoints              Unicode codepoints (hex)
    utf8-hex                UTF-8 bytes (hex)
    utf16-hex               UTF-16 words (hex)

    braces-escape           String literal with \u{...} escapes for
    | js6-escape            control and non-ASCII characters
    | rust-escape

    braces-escape-all       String literal with \u{...} escapes for
    | js6-escape-all        all characters
    | rust-escape-all

    braces-escape-control   String literal with \u{...} escapes for
    | js6-escape-control    control characters
    | rust-escape-control

Inspector Command

The unic-inspector command lists the Unicode characters in the input string, along with their properties.

$ unic-inspector Hello سلام
 H | U+0048 | LATIN CAPITAL LETTER H | Uppercase_Letter
 e | U+0065 | LATIN SMALL LETTER E   | Lowercase_Letter
 l | U+006C | LATIN SMALL LETTER L   | Lowercase_Letter
 l | U+006C | LATIN SMALL LETTER L   | Lowercase_Letter
 o | U+006F | LATIN SMALL LETTER O   | Lowercase_Letter
   | U+0020 | SPACE                  | Space_Separator
 س | U+0633 | ARABIC LETTER SEEN     | Other_Letter
 ل | U+0644 | ARABIC LETTER LAM      | Other_Letter
 ا | U+0627 | ARABIC LETTER ALEF     | Other_Letter
 م | U+0645 | ARABIC LETTER MEEM     | Other_Letter

Soon, this tool will allow selecting which properties to show, as well as support grouping characters by Grapheme Clusters, Words, Sentences, etc.

Dependencies

~7MB
~111K SLoC