#unicode #obfuscation #ascii-text #encoding #zalgo #text-encoding

no-std zalgo-codec-common

A crate for converting an ASCII text string to a single unicode grapheme cluster and back

25 releases (8 breaking)

0.10.4 Jan 14, 2024
0.10.1 Dec 21, 2023
0.9.2 Nov 28, 2023
0.8.3 Jul 29, 2023
0.2.6 Nov 20, 2022

#949 in Text processing

Download history 41/week @ 2024-01-02 33/week @ 2024-01-09 1/week @ 2024-02-20 28/week @ 2024-02-27 2/week @ 2024-03-05 35/week @ 2024-03-12 1/week @ 2024-03-26 97/week @ 2024-04-02 6/week @ 2024-04-16

104 downloads per month
Used in 2 crates

MIT/Apache

57KB
644 lines

zalgo-codec-common

A crate for converting a string containing only printable ASCII and newlines into a single unicode grapheme cluster and back. Provides the non-macro functionality of the crate zalgo-codec.

There are two ways of interacting with the codec. The first is to call the encoding and decoding functions directly, and the second is to use the ZalgoString wrapper type.

Examples

Encode a string to a grapheme cluster with zalgo_encode:

let s = "Zalgo";
let encoded = zalgo_encode(s)?;
assert_eq!(encoded, "É̺͇͌͏");

Decode a grapheme cluster back into a string:

let encoded = "É̺͇͌͏";
let s = zalgo_decode(encoded)?;
assert_eq!(s, "Zalgo");

The ZalgoString type can be used to encode a string and handle the result in various ways:

let s = "Zalgo";
let zstr = ZalgoString::new(s)?;
assert_eq!(zstr, "É̺͇͌͏");
assert_eq!(zstr.len(), 2 * s.len() + 1);
assert_eq!(zstr.decoded_len(), s.len());
assert_eq!(zstr.bytes().next(), Some(69));
assert_eq!(zstr.decoded_chars().next_back(), Some('o'));

Explanation

Characters U+0300–U+036F are the combining characters for unicode Latin. The fun thing about combining characters is that you can add as many of these characters as you like to the original character and it does not create any new symbols, it only adds symbols on top of the character. It's supposed to be used in order to create characters such as by taking a normal a and adding another character to give it the mark (U+301, in this case). Fun fact: Unicode doesn't specify any limit on the number of these characters. Conveniently, this gives us 112 different characters we can map to, which nicely maps to the ASCII character range 0x20 -> 0x7F, aka all the non-control characters. The only issue is that we can't have new lines in this system, so to fix that, we can simply map 0x7F (DEL) to 0x0A (LF). This can be represented as (CHARACTER - 11) % 133 - 21, and decoded with (CHARACTER + 22) % 133 + 10.

Experiment with the codec

There is an executable available for experimenting with the codec on text and files. It can be installed with cargo install zalgo-codec --features binary. You can optionally enable the gui feature during installation to include a rudimentary GUI mode for the program.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Dependencies

~180KB