#japanese #unicode #kana #hankaku #zenkaku

bin+lib unicode-jp

A library to convert Japanese Half-width-kana[半角カナ] and Wide-alphanumeric[全角英数] into normal ones

4 releases (breaking)

Uses old Rust 2015

0.4.0 Apr 11, 2020
0.3.0 Mar 21, 2017
0.2.0 Jul 31, 2016
0.1.0 Jul 27, 2016

#824 in Text processing

Download history 72/week @ 2023-12-05 106/week @ 2023-12-12 79/week @ 2023-12-19 35/week @ 2023-12-26 64/week @ 2024-01-02 218/week @ 2024-01-09 91/week @ 2024-01-16 112/week @ 2024-01-23 120/week @ 2024-01-30 81/week @ 2024-02-06 364/week @ 2024-02-13 120/week @ 2024-02-20 233/week @ 2024-02-27 467/week @ 2024-03-05 158/week @ 2024-03-12 112/week @ 2024-03-19

988 downloads per month
Used in 7 crates (4 directly)

MIT license

27KB
371 lines

Unicode-JP (Rust)

Build Status crates.io MIT licensed

Converters of troublesome characters included in Japanese texts.

  • Half-width-kana[半角カナ;HANKAKU KANA] -> normal Katakana
  • Wide-alphanumeric[全角英数;ZENKAKU EISU] <-> normal ASCII

If you need canonicalization of texts including Japanese, consider to use unicode_normalization crate at first. NFD, NFKD, NFC and NFKC can be used. This crate, however, works with you if you are in a niche such as a need of delicate control of Japanese characters for a restrictive character terminal.

Japanese have two syllabary systems Hiragana and Katakana, and Half-width-kana is another notation system of them. In the systems, there are two combinable diacritical marks Voiced-sound-mark and Semi-voiced-sound-mark. Unicode has three independent code points for each of the marks. In addition to it, we often use special style Latin alphabets and Arabic numbers called Wide-alphanumeric in Japanese texts. This small utility converts these codes each other.

API Reference

Example

Cargo.toml

[dependencies]
unicode-jp = "0.4.0"

src/main.rs

extern crate kana;
use kana::*;

fn main() {
    let s1 = "マツオ バショウ ア゚";
    assert_eq!("マツオ バショウ ア ゚", half2kana(s1));
    assert_eq!("マツオ バショウ ア゚", half2full(s1));

    let s2 = "ひ゜ひ゛んは゛";
    assert_eq!("ぴびんば", combine(s2));
    assert_eq!("ひ ゚ひ ゙んは ゙", vsmark2combi(s2));

    let s3 = "#&Rust-1.6!";
    assert_eq!("#&Rust-1.6!", wide2ascii(s3));
}

Functions of kana crate:

  • wide2ascii(&str) -> String
    convert Wide-alphanumeric into normal ASCII [A -> A]

  • ascii2wide(&str) -> String
    convert normal ASCII characters into Wide-alphanumeric [A -> A]

  • half2full(&str) -> String
    convert Half-width-kana into normal Katakana with diacritical marks separated [ア゙パ -> ア゙パ]
    This method is simple, but tends to cause troubles when rendering. In such a case, use half2kana() or execute vsmark2{full|half|combi} as post process.

  • half2kana(&str) -> String
    convert Half-width-kana into normal Katakana with diacritical marks combined [ア゙パ -> ア゙パ]

  • combine(&str) -> String
    combine base characters and diacritical marks on Hiragana/Katakana [がハ゜ -> がパ]

  • hira2kata(&str) -> String
    convert Hiragana into Katakana [あ -> ア]

  • kata2hira(&str) -> String
    convert Katakana into Hiragana [ア -> あ]

  • vsmark2full(&str) -> String
    convert all separated Voiced-sound-marks into full-width style "\u{309B}"

  • vsmark2half(&str) -> String
    convert all separated Voiced-sound-marks into half-width style "\u{FF9E}"

  • vsmark2combi(&str) -> String
    convert all separated Voiced-sound-marks into space+combining style "\u{20}\u{3099}"

  • nowidespace(&str) -> String
    convert Wide-space into normal space [" " -> " "]

  • space2wide(&str) -> String
    convert normal space into Wide-space [" " -> " "]

  • nowideyen(&str) -> String
    convert Wide-yen into Half-width-yen ["¥" -> "¥"]

  • yen2wide(&str) -> String
    convert Half-width-yen into Wide-yen ["¥" -> "¥"]

TODO or NOT TODO

  • Voiced-sound-marks -> no space combining style "\u{3099}"
  • Half-width-kana <- normal Katakana
  • (normal/wide)tilde <-> Wave-dash

Dependencies

~4MB
~85K SLoC