#unicode #character #grapheme #unicode-characters #text

unicode_clusters

This crate provides variable width unicode characters as single items, allowing for array like indexing etc

3 releases

0.1.2 Dec 9, 2021
0.1.1 Dec 9, 2021
0.1.0 Dec 9, 2021

#1814 in Text processing

MIT license

8KB
182 lines

Unicode Clusters

Unicode Clusters is a library that support variable width unicode characters as single items, allowing for array like indexing etc.

#[test]
fn example() {
	let input = "AȜनमस्ते";

	let gcs = GraphemeCluster::graphemes(input);
	assert!(gcs.len() == 6, "length");

	assert_eq!(gcs[0].as_string(), "A");

	assert_eq!(gcs[1].as_string(), "Ȝ");
	assert_eq!(gcs[2].as_string(), "");
	assert_eq!(gcs[3].as_string(), "");
	assert_eq!(gcs[4].as_string(), "स्");
	assert_eq!(gcs[5].as_string(), "ते");

	assert_eq!(gcs[0].as_bytes()[..], [65]);
	assert_eq!(gcs[1].as_bytes()[..], [200, 156]);
	assert_eq!(gcs[2].as_bytes()[..], [224, 164, 168]);
	assert_eq!(gcs[3].as_bytes()[..], [224, 164, 174]);
	assert_eq!(gcs[4].as_bytes()[..], [224, 164, 184,	224, 165, 141]);
	assert_eq!(gcs[5].as_bytes()[..], [224, 164, 164,	224, 165, 135]);
}

Dependencies

~550KB