#transliteration #pandoc #filter #text

bin+lib translitrs

Transliteration utility for Serbian language

3 releases

0.2.2 Feb 19, 2023
0.2.1 Jan 15, 2023
0.2.0 Dec 15, 2021

#698 in Text processing

21 downloads per month

MIT license

58KB
1.5K SLoC

Crates.io Build status License

translitRS — Transliterator for Serbian Language

TranslitRS is a command-line utility for transliteration between Cyrillic and Latin scripts of the Serbian language. It can work on plain text files directly, or as a filter for Pandoc document processor (Markdown, HTML, LaTeX, Microsoft Word...).

Usage

Arguments

  • -i, --input <path>
    Read input from file
    Default: standard input
  • -o, --output <path>
    Write output to file
    Default: standard output
  • -f, --from <charset>
    Convert from character set
    Default: latin
  • -t, --into <charset>
    Convert to character set
    Default: cyrillic
  • -d, --skip-digraph
    Do not check for digraph exceptions
  • -u, --force-foreign
    Process words with foreign and mixed characters
  • -l, --force-links
    Process hyperlinks, email addresses and units
  • -p, --pandoc-filter
    Run in Pandoc JSON pipe filter mode
  • -v, --version
    Show version and quit
  • -h, --help
    Show usage help and quit

Character sets

Listed below are available character sets and their shorthand codes:

  • Serbian Latin
    latin, lat, l
  • Serbian Latin (Unicode)
    latin8, lat8, l8
  • Serbian Cyrillic
    cyrillic, cyr, c

Pandoc filter mode

When running as a Pandoc filter, the arguments listed above can't be passed directly. Instead, use the following arguments variables:

  • CHARS_FROM=<charset>
    Convert from character set
  • CHARS_INTO=<charset>
    Convert to character set
  • SKIP_DIGRAPH=1
    Do not check for digraph exceptions
  • FORCE_FOREIGN=1
    Process words with foreign and mixed characters
  • FORCE_LINKS=1
    Process hyperlinks, email addresses and units

Examples

# Transliterate plaintext file from Latin (Unicode) to Cyrillic
translitrs -f lat8 -t cyr -i source.txt -o destination.txt

# Transliterate Microsoft Word document from Cyrillic to Latin
CHARS_FROM=c CHARS_INTO=l pandoc essay.docx --filter translitrs -o essay.docx

Dependencies

~2.3–3.5MB
~64K SLoC