#unicode-characters #unicode #name #table #bidirectional #character #length

no-std unicode_names

Map characters to and from their name given in the Unicode standard. This goes to great lengths to be as efficient as possible in both time and space, with the full bidirectional tables weighing barely 500 KB but still offering O(1)* look-up in both directions. (*more precisely, O(length of name).)

8 releases

Uses old Rust 2015

0.1.7 May 4, 2015
0.1.6 Apr 8, 2015
0.1.5 Mar 16, 2015
0.1.4 Jan 30, 2015
0.1.0 Nov 14, 2014

#1789 in Text processing


Used in 2 crates (via unicode_names_macros)

MIT/Apache

1.5MB
19K SLoC

unicode_names

Build Status Coverage Status

Time and memory efficiently mapping characters to and from their Unicode 7.0 names, at runtime and compile-time.

extern crate unicode_names;

fn main() {
    println!("☃ is called {}", unicode_names::name('')); // SNOWMAN
    println!("{} is happy", unicode_names::character("white smiling face")); //
    // (NB. case insensitivity)
}

The maps are compressed using similar tricks to Python's unicodedata module, although those here are about 70KB (12%) smaller.

Documentation


lib.rs:

Convert between characters and their standard names.

This crate provides two functions for mapping from a char to the name given by the Unicode standard (7.0). There are no runtime requirements so this is usable with only core (this requires specifying the no_std cargo feature). The tables are heavily compressed, but still large (500KB), and still offer efficient O(1) look-ups in both directions (more precisely, O(length of name)).

extern crate unicode_names;

fn main() {
    println!("☃ is called {:?}", unicode_names::name('')); // SNOWMAN
    println!("{:?} is happy", unicode_names::character("white smiling face")); //
    // (NB. case insensitivity)
}

Source.

Macros

The associated unicode_names_macros crate provides two macros for converting at compile-time, giving named literals similar to Python's "\N{...}".

  • named_char!(name) takes a single string name and creates a char literal.
  • named!(string) takes a string and replaces any \\N{name} sequences with the character with that name. NB. String escape sequences cannot be customised, so the extra backslash (or a raw string) is required.
#![feature(plugin)]
#![plugin(unicode_names_macros)]

fn main() {
    let x: char = named_char!("snowman");
    assert_eq!(x, '');

    let y: &str = named!("foo bar \\N{BLACK STAR} baz qux");
    assert_eq!(y, "foo bar ★ baz qux");
}

Cargo-enabled

This package is on crates.io, so add either (or both!) of the following to your Cargo.toml.

[dependencies]
unicode_names = "0.1"
unicode_names_macros = "0.1"

No runtime deps