#chinese #character-encoding #base64 #encryption #binary-data #unicode-characters

bin+lib basehan

A data encryption method using Chinese characters. Kind of like base64.

8 releases

0.9.0 Jun 20, 2024
0.2.4 Jun 1, 2024
0.2.3 Feb 29, 2024
0.2.0 Oct 21, 2022
0.1.4 Oct 21, 2022

#1524 in Encoding

MIT license

22KB
435 lines

Base-CJK

Use CJK characters to encode to binary data to text.

CJK chracter ranges:

CJK Unified Ideographs 4E00-9FFF Common
CJK Unified Ideographs Extension A 3400-4DBF Rare
This utility converts every 13 bits to a Unicode code point which lies in the range of `[4E00, 6E00)`. In addition, 8E00 is also used as a functional character to show whether the ending byte is split to 2 code points or not. In v1, in order to support streaming mode, we make `[6E00, 7E00)` in use, which has 2^12 code points, to indicate the end of file without introducing control characters with no information, which requires the decoder to peek 1 character forward while somehow impossible in streaming.

Dependencies

~1.5–2.2MB
~42K SLoC