#fold #ascii #unicode #transliteration #cli

bin+lib superfold

A multilingual Rust library and CLI to process UTF-8 strings to exclude diacritics and fold non-phonetic graphemes into their phonetic ASCII representation

2 releases

new 0.1.1 May 16, 2025
0.1.0 May 16, 2025

#518 in Text processing

Download history 133/week @ 2025-05-11

133 downloads per month

MIT/Apache

37KB
623 lines

superfold

Crates.io Docs.rs build status

A multilingual Rust library and CLI tool to process UTF-8 strings to exclude diacritics and fold non-phonetic graphemes into their phonetic ASCII representation (romantization by transliteration). This library preserves original whitespace (spaces, tabs, newlines, etc.), only transforming the actual word content and emoji representations. This means that: Japonic and Sino-Tibetan based languages such as Chinese and Japanese characters are represented as ASCII. Also means that: Emoji are replaced by their name enclosed by ":" as 🍆 becomes "🍆".

Examples:

use superfold::fold;

assert_eq!(fold("北亰"), "BeiJing");
assert_eq!(fold("🦄"), ":unicorn:");

// Whitespace and structure are preserved:
assert_eq!(
    fold("  你好  世界\nNext line with piejlüsse  কথাটা 🦄!"), 
    "  NiHao  ShiJie\nNext line with piejlusse kotha :unicorn:!"
);

This library is inspired by great work of others such as:

CLI Usage

superfold can also be used as a command-line tool to process files and directories.

Installation:

If you have Rust installed, you can build and install the CLI:

cargo install --path . # Run from the root of the superfold project directory

Or, after building with cargo build --release, find the binary at target/release/superfold.

Usage:

superfold [OPTIONS] [INPUTS]...

Options:

  • -o, --output-dir <OUTPUT_DIR>: Output directory for processed files when multiple inputs or a directory are provided. Defaults to "superfold_output".
  • -f, --overwrite: Overwrite output files or directory if they already exist.
  • -h, --help: Print help information.
  • -V, --version: Print version information.

Examples:

  1. Fold a string from stdin:

    echo "precisão" | superfold
    

    Output:

    precisao
    
  2. Fold a single file (outputs to filename_folded.ext):

    superfold myfile.txt
    

    This will create myfile_folded.txt in the same directory.

  3. Fold specific files into an output directory:

    superfold file1.txt path/to/file2.log -o my_folded_texts
    

    This will create my_folded_texts/file1.txt and my_folded_texts/file2.log.

  4. Fold all text files in a directory (recursively) into an output directory:

    superfold ./input_documents --output-dir ./folded_documents
    

    This will process text files in ./input_documents and its subdirectories, replicating the structure in ./folded_documents.

  5. Overwrite existing output:

    superfold myfile.txt -f
    

Piping: superfold supports piping from stdin and to stdout, fitting into standard Unix pipelines:

cat long_text_file.txt | superfold > output.txt
echo "你好 🦄" | superfold | sed 's/:unicorn:/U/' # Example of further processing

Dependencies

~3.5MB
~50K SLoC