#normalization #utf-8 #vietnamese #locdau

bin+lib tb_normalization

A library for normalization utf8 string, loc dau vietnamese and some language

2 releases (1 stable)

1.0.0 Feb 1, 2021
0.9.9 Feb 1, 2021

#1527 in Text processing

Custom license

5KB
69 lines

UTF8Normalizer

extern crate tb_normalization;
use tb_normalization::unicode::TbNormalization;

fn main() {
  let s = "số 22 ngách 63/30/16 lê đức thọ , mỹ đình 2  Được chưa nhỉ  --";
  println!("{}", s.tb_normalization());
  println!("{}", s.remove_special_characters());
}

Dependencies

~2.8–4MB
~94K SLoC