#text #unicode

no-std unicode_names2

Map characters to and from their name given in the Unicode standard. This goes to great lengths to be as efficient as possible in both time and space, with the full bidirectional tables weighing barely 500 KB but still offering O(1)* look-up in both directions. (*more precisely, O(length of name).)

4 releases (2 breaking)

Uses old Rust 2015

0.4.0 Mar 17, 2020
0.3.0 Jul 26, 2019
0.2.2 Jun 17, 2018
0.2.1 Jun 17, 2018
0.2.0 Jun 17, 2018

#162 in Text processing

Download history 1328/week @ 2021-08-14 1874/week @ 2021-08-21 1835/week @ 2021-08-28 1353/week @ 2021-09-04 1857/week @ 2021-09-11 1274/week @ 2021-09-18 1785/week @ 2021-09-25 1389/week @ 2021-10-02 1458/week @ 2021-10-09 1678/week @ 2021-10-16 1128/week @ 2021-10-23 1750/week @ 2021-10-30 945/week @ 2021-11-06 595/week @ 2021-11-13 335/week @ 2021-11-20 670/week @ 2021-11-27

2,643 downloads per month
Used in 26 crates (11 directly)

MIT/Apache

2MB
22K SLoC

unicode_names2

Build Status

Time and memory efficiently mapping characters to and from their Unicode 8.0 names, at runtime and compile-time.

extern crate unicode_names2;

fn main() {
    println!("☃ is called {}", unicode_names2::name('')); // SNOWMAN
    println!("{} is happy", unicode_names2::character("white smiling face")); //
    // (NB. case insensitivity)
}

The maps are compressed using similar tricks to Python's unicodedata module, although those here are about 70KB (12%) smaller.

Documentation


lib.rs:

Convert between characters and their standard names.

This crate provides two functions for mapping from a char to the name given by the Unicode standard (8.0). There are no runtime requirements so this is usable with only core (this requires specifying the no_std cargo feature). The tables are heavily compressed, but still large (500KB), and still offer efficient O(1) look-ups in both directions (more precisely, O(length of name)).

extern crate unicode_names2;

fn main() {
    println!("☃ is called {:?}", unicode_names2::name('')); // SNOWMAN
    println!("{:?} is happy", unicode_names2::character("white smiling face")); //
    // (NB. case insensitivity)
}

Source.

Macros

The associated unicode_names2_macros crate provides two macros for converting at compile-time, giving named literals similar to Python's "\N{...}".

  • named_char!(name) takes a single string name and creates a char literal.
  • named!(string) takes a string and replaces any \\N{name} sequences with the character with that name. NB. String escape sequences cannot be customised, so the extra backslash (or a raw string) is required, unless you use a raw string.
#![feature(proc_macro_hygiene)]

#[macro_use]
extern crate unicode_names2_macros;

fn main() {
    let x: char = named_char!("snowman");
    assert_eq!(x, '');

    let y: &str = named!("foo bar \\N{BLACK STAR} baz qux");
    assert_eq!(y, "foo bar ★ baz qux");

    let y: &str = named!(r"foo bar \N{BLACK STAR} baz qux");
    assert_eq!(y, "foo bar ★ baz qux");
}

Cargo-enabled

This package is on crates.io, so add either (or both!) of the following to your Cargo.toml.

[dependencies]
unicode_names2 = "0.2.1"
unicode_names2_macros = "0.2"

No runtime deps

Zq^