0.5.0 May 10, 2022
0.4.1 Jan 31, 2022
0.4.0 Nov 1, 2021
0.3.0 Jul 30, 2021
0.1.0 Oct 15, 2020

#44 in #icu4x


Used in 3 crates (2 directly)

Custom license

1MB
13K SLoC

icu_uniset crates.io

icu_uniset is a utility crate of the ICU4X project.

This API provides necessary functionality for highly efficient querying of sets of Unicode characters.

It is an implementation of the existing ICU4C UnicodeSet API.

Architecture

ICU4X UnicodeSet is split up into independent levels, with UnicodeSet representing the membership/query API, and UnicodeSetBuilder representing the builder API. A Properties API is in future works.

Examples:

Creating a UnicodeSet

UnicodeSets are created from either serialized UnicodeSets, represented by inversion lists, the UnicodeSetBuilder, or from the TBA Properties API.

use icu_uniset::{UnicodeSet, UnicodeSetBuilder};

let mut builder = UnicodeSetBuilder::new();
builder.add_range(&('A'..'Z'));
let set: UnicodeSet = builder.build();

assert!(set.contains('A'));

Querying a UnicodeSet

Currently, you can check if a character/range of characters exists in the UnicodeSet, or iterate through the characters.

use icu_uniset::{UnicodeSet, UnicodeSetBuilder};

let mut builder = UnicodeSetBuilder::new();
builder.add_range(&('A'..'Z'));
let set: UnicodeSet = builder.build();

assert!(set.contains('A'));
assert!(set.contains_range(&('A'..='C')));
assert_eq!(set.iter_chars().next(), Some('A'));

More Information

For more information on development, authorship, contributing etc. please visit ICU4X home page.

Dependencies

~0.4–1MB
~24K SLoC