#chinese #simplified-chinese #traditional-chinese #convert #hanzi #localization

character_converter

Turn Traditional Chinese script ot Simplified Chinese script and vice-versa and tokenize

12 releases (stable)

2.1.5 Nov 10, 2023
2.1.4 Oct 12, 2022
2.1.3 Sep 10, 2022
2.1.2 Aug 2, 2022
0.1.6 May 13, 2020

#1585 in Text processing

Download history 81/week @ 2025-08-26 193/week @ 2025-09-02 231/week @ 2025-09-09 178/week @ 2025-09-16 93/week @ 2025-09-23 171/week @ 2025-09-30 74/week @ 2025-10-07 182/week @ 2025-10-14 450/week @ 2025-10-21 99/week @ 2025-10-28 224/week @ 2025-11-04 73/week @ 2025-11-11 430/week @ 2025-11-18 198/week @ 2025-11-25 101/week @ 2025-12-02 429/week @ 2025-12-09

1,161 downloads per month
Used in 2 crates

MIT license

2.5MB
274 lines

character_converter


About

Turn Traditional Chinese script to Simplified Chinese script and vice-versa. Check string script to determine if string is Traditional or Simplified Chinese Characters.

This package also includes a largest first matching tokenizer.

Usage

extern crate character_converter;

use character_converter::{is_traditional, is_simplified, traditional_to_simplified, simplified_to_traditional, tokenize};

let traditional_text = "歐洲";
let simplified_text = "欧洲";

// Check script
assert!(is_traditional(traditional_text));

assert!(!is_simplified(traditional_text));

// Convert script
let result_three = traditional_to_simplified(traditional_text);
assert_eq!(result_three, simplified_text);

let result_four = simplified_to_traditional(simplified_text);
assert_eq!(result_four, traditional_text);

// Tokenize
let string = "好好学习天天向上.";
let tokens = vec!["好好", "学习", "天天", "向上"];
assert_eq!(tokens, tokenize(string));

Benchmarks

Run benchmarks using the nightly bench feature:

cargo +nightly bench --features=bench

License

MIT

Dependencies

~2.5MB
~22K SLoC