3 releases
Uses old Rust 2015
0.1.2 | Aug 2, 2018 |
---|---|
0.1.1 | Aug 2, 2018 |
0.1.0 | Aug 2, 2018 |
#1509 in Text processing
42 downloads per month
13KB
140 lines
Kana/ASCII Conversion Utility
Kana conversion utilities to convert Japanese katakana and hiragana, along with ASCII characters, into full-width (zenkaku) or half-width (hankaku) forms.
There are various requirements around storing data, reporting, user presentation, etc. with regard to Japanese scripts and ASCII characters. With many services, data must be sent in a particular format (for example, some financial services require all katakana to be sent as half-width, single-byte (hankaku) characters with kanji and ASCII being sent in full-width, double-byte (zenkaku) characters). In other use cases, data is stored as all double-byte (zenkaku) characters, awaiting transformation as required by separate utilities. This library aims to help in these conversions, making storing and sending Japanese characters easier.
Terminology
Since various terms are used for Japanese character representation, which can be confusing, this list is provided to clarify the definitions:
Term | Description |
---|---|
Half-width kana | Katakana characters which are represented by a single byte |
Full-width kana | Katakana characters which are represented by two bytes |
Single-byte | Characters whose values are stored in a single byte (half-width katakana, normal ASCII) |
Double-byte | Characters whose values are stored in two bytes (full-width katakana, hiragana, kanji, double-byte ASCII) |
Hankaku (半角) | The Japanese term for "half-width". Same definition as "Single-byte". |
Zenkaku (全角) | The Japanese term for "full-width". Same definition as "Double-byte". |
Katakana (カタカナ) | The Japanese Katakana character set, which can be stored in wither one byte (half-width) or two (full-width) |
Hiragana (ひらがな) | The Japanese Hiragana character set, which can only be stored as a two-byte value |
Kanji (漢字) | The Japanese Kanji characters, which can only be stored as a two-byte value |
Romaji (ローマ字) | Literally "Roman letters", the set of characters used to represent Japanese words in Latin alphabetical format (using ASCII characters). |
ASCII | The standard ASCII character set, which is usually stored in one-byte, but also has a two-byte version for Japanese "Romaji" representation |
Usage
Simply add to your Cargo.toml
:
[dependencies]
kana-conversion = "0.1"
and use in your rust code:
extern crate kana_conversion;
use kana_conversion::to_double_byte;
Conversion Functions
The "to_double_byte" function takes a string slice to convert and a mode. If AsciiOnly
is
selected, only normal ASCII chars will be converted to double-byte (zenkaku), while if KanaOnly
is selected, only half-width (hankaku) katakana will convert, while KanaAndAscii
will convert both.
The return is an owned String which holds double-byte (zenkaku) characters, converted as specified by
the mode
.
License
Licensed under:
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
Contribution
Contributions are welcome, and will be subject to the regulations presented in the license indicated above.
Dependencies
~10KB