#unicode-characters #unicode #name #mapping #memory-mapping #run-time #lookup-tables

no-std unicode_names2

Map characters to and from their name given in the Unicode standard. This goes to great lengths to be as efficient as possible in both time and space, with the full bidirectional tables weighing barely 500 KB but still offering O(1)* look-up in both directions. (*more precisely, O(length of name).)

12 releases (5 stable)

1.2.2 Mar 10, 2024
1.2.1 Dec 14, 2023
1.2.0 Oct 14, 2023
0.6.0 Oct 13, 2022
0.2.2 Jun 17, 2018

#57 in Text processing

Download history 5208/week @ 2023-12-06 5203/week @ 2023-12-13 4214/week @ 2023-12-20 3924/week @ 2023-12-27 6132/week @ 2024-01-03 5670/week @ 2024-01-10 7166/week @ 2024-01-17 7514/week @ 2024-01-24 6285/week @ 2024-01-31 5811/week @ 2024-02-07 6983/week @ 2024-02-14 5381/week @ 2024-02-21 6164/week @ 2024-02-28 9187/week @ 2024-03-06 7194/week @ 2024-03-13 7179/week @ 2024-03-20

30,887 downloads per month
Used in 59 crates (16 directly)

(MIT OR Apache-2.0) AND Unicode-DFS-2016

305KB
1K SLoC

Rust 819 SLoC // 0.1% comments Python 192 SLoC // 0.0% comments

unicode_names2

Build Status

Time and memory efficiently mapping characters to and from their Unicode 15.1 names, at runtime and compile-time.

fn main() {
    println!("☃ is called {}", unicode_names2::name('')); // SNOWMAN
    println!("{} is happy", unicode_names2::character("white smiling face")); //
    // (NB. case insensitivity)
}

The maps are compressed using similar tricks to Python's unicodedata module, although those here are about 70KB (12%) smaller.

Documentation


lib.rs:

Convert between characters and their standard names.

This crate provides two functions for mapping from a char to the name given by the Unicode standard (15.1). There are no runtime requirements so this is usable with only core (this requires specifying the no_std cargo feature). The tables are heavily compressed, but still large (500KB), and still offer efficient O(1) look-ups in both directions (more precisely, O(length of name)).

    println!("☃ is called {:?}", unicode_names2::name('')); // SNOWMAN
    println!("{:?} is happy", unicode_names2::character("white smiling face")); //
    // (NB. case insensitivity)

Source.

Macros

The associated unicode_names2_macros crate provides two macros for converting at compile-time, giving named literals similar to Python's "\N{...}".

  • named_char!(name) takes a single string name and creates a char literal.
  • named!(string) takes a string and replaces any \\N{name} sequences with the character with that name. NB. String escape sequences cannot be customised, so the extra backslash (or a raw string) is required, unless you use a raw string.
#![feature(proc_macro_hygiene)]

#[macro_use]
extern crate unicode_names2_macros;

fn main() {
    let x: char = named_char!("snowman");
    assert_eq!(x, '');

    let y: &str = named!("foo bar \\N{BLACK STAR} baz qux");
    assert_eq!(y, "foo bar ★ baz qux");

    let y: &str = named!(r"foo bar \N{BLACK STAR} baz qux");
    assert_eq!(y, "foo bar ★ baz qux");
}

Cargo-enabled

This package is on crates.io, so add either (or both!) of the following to your Cargo.toml.

[dependencies]
unicode_names2 = "0.2.1"
unicode_names2_macros = "0.2"

Dependencies