#kana #half-width #full-width #byte-conversion

kana-converter

A simple converter for half-width/full-width Japanese language characters (katakana, hiragana, and ASCII)

3 releases

Uses old Rust 2015

0.1.2 Aug 2, 2018
0.1.1 Aug 2, 2018
0.1.0 Aug 2, 2018

#972 in Text processing

42 downloads per month

MIT license

13KB
140 lines

Kana/ASCII Conversion Utility

Kana conversion utilities to convert Japanese katakana and hiragana, along with ASCII characters, into full-width (zenkaku) or half-width (hankaku) forms.

There are various requirements around storing data, reporting, user presentation, etc. with regard to Japanese scripts and ASCII characters. With many services, data must be sent in a particular format (for example, some financial services require all katakana to be sent as half-width, single-byte (hankaku) characters with kanji and ASCII being sent in full-width, double-byte (zenkaku) characters). In other use cases, data is stored as all double-byte (zenkaku) characters, awaiting transformation as required by separate utilities. This library aims to help in these conversions, making storing and sending Japanese characters easier.

Terminology

Since various terms are used for Japanese character representation, which can be confusing, this list is provided to clarify the definitions:

Term Description
Half-width kana Katakana characters which are represented by a single byte
Full-width kana Katakana characters which are represented by two bytes
Single-byte Characters whose values are stored in a single byte (half-width katakana, normal ASCII)
Double-byte Characters whose values are stored in two bytes (full-width katakana, hiragana, kanji, double-byte ASCII)
Hankaku (半角) The Japanese term for "half-width". Same definition as "Single-byte".
Zenkaku (全角) The Japanese term for "full-width". Same definition as "Double-byte".
Katakana (カタカナ) The Japanese Katakana character set, which can be stored in wither one byte (half-width) or two (full-width)
Hiragana (ひらがな) The Japanese Hiragana character set, which can only be stored as a two-byte value
Kanji (漢字) The Japanese Kanji characters, which can only be stored as a two-byte value
Romaji (ローマ字) Literally "Roman letters", the set of characters used to represent Japanese words in Latin alphabetical format (using ASCII characters).
ASCII The standard ASCII character set, which is usually stored in one-byte, but also has a two-byte version for Japanese "Romaji" representation

Usage

Simply add to your Cargo.toml:

[dependencies]
kana-conversion = "0.1"

and use in your rust code:

extern crate kana_conversion;

use kana_conversion::to_double_byte;

Conversion Functions

The "to_double_byte" function takes a string slice to convert and a mode. If AsciiOnly is selected, only normal ASCII chars will be converted to double-byte (zenkaku), while if KanaOnly is selected, only half-width (hankaku) katakana will convert, while KanaAndAscii will convert both.

The return is an owned String which holds double-byte (zenkaku) characters, converted as specified by the mode.

License

Licensed under:

Contribution

Contributions are welcome, and will be subject to the regulations presented in the license indicated above.

Dependencies

~10KB