#unicode #skeleton #confusable #unicode-characters #text


This crate detects unicode strings that look nearly identical once rendered, but do not compare as equal. It defines "confusable" and "skeleton" based on Unicode Standard Annex #39

2 releases

Uses old Rust 2015

0.1.1 Oct 8, 2017
0.1.0 Oct 8, 2017

#1767 in Text processing

Download history 6/week @ 2024-02-18 28/week @ 2024-02-25 6/week @ 2024-03-03 7/week @ 2024-03-10 5/week @ 2024-03-17 3/week @ 2024-03-24 42/week @ 2024-03-31

58 downloads per month
Used in 2 crates


105 lines

Unicode character "confusable" detection and "skeleton" computation, specified by the Unicode Standard Annex #39. These functions are for working with strings that appear nearly identical once rendered, but do not compare as equal.


extern crate unicode_skeleton;

use unicode_skeleton::{UnicodeSkeleton, confusable};

fn main() {
    assert_eq!("𝔭𝒢ỿ𝕑𝕒ℓ".skeleton_chars().collect::<String>(), "paypal");
    assert!(confusable("β„π“Šπ“ˆπ“‰", "Rust"));


Adding the following to your Cargo.toml to use:

unicode_skeleton = "0.1.0"


Transforms a unicode string by replacing unusual characters with similar-looking common characters, as specified by the Unicode Standard Annex #39. For example, "β„π“Šπ“ˆπ“‰" will be transformed to "Rust". This simplified string is called the "skeleton".

use unicode_skeleton::UnicodeSkeleton;

"β„π“Šπ“ˆπ“‰".skeleton_chars().collect::<String>() // "Rust"

Strings are considered "confusable" if they have the same skeleton. For example, "β„π“Šπ“ˆπ“‰" and "Rust" are confusable.

use unicode_skeleton::confusable;

confusable("β„π“Šπ“ˆπ“‰", "Rust") // true

The translation to skeletons is based on Unicode Security Mechanisms for UTR #39 version 10.0.0.


~40K SLoC