#text #unicode #character-property #category


UNIC — Unicode Character Database — General Category

5 releases (breaking)

0.9.0 Mar 3, 2019
0.8.0 Jan 2, 2019
0.7.0 Feb 7, 2018
0.6.0 Sep 22, 2017
0.5.0 Aug 5, 2017

#353 in Text processing

Download history 3715/week @ 2023-08-14 6853/week @ 2023-08-21 5487/week @ 2023-08-28 4257/week @ 2023-09-04 5088/week @ 2023-09-11 4745/week @ 2023-09-18 5953/week @ 2023-09-25 4945/week @ 2023-10-02 4229/week @ 2023-10-09 4648/week @ 2023-10-16 4717/week @ 2023-10-23 7036/week @ 2023-10-30 6085/week @ 2023-11-06 5953/week @ 2023-11-13 4507/week @ 2023-11-20 3669/week @ 2023-11-27

20,847 downloads per month
Used in 43 crates (12 directly)



UNIC — UCD — Category

A component of unic: Unicode and Internationalization Crates for Rust.

Unicode General_Category.

The General_Category property of a code point provides for the most general classification of that code point. It is usually determined based on the primary characteristic of the assigned character for that code point. For example, is the character a letter, a mark, a number, punctuation, or a symbol, and if so, of what type? Other General_Category values define the classification of code points which are not assigned to regular graphic characters, including such statuses as private-use, control, surrogate code point, and reserved unassigned.

Many characters have multiple uses, and not all such cases can be captured entirely by the General_Category value. For example, the General_Category value of Latin, Greek, or Hebrew letters does not attempt to cover (or preclude) the numerical use of such letters as Roman numerals or in other numerary systems. Conversely, the General_Category of ASCII digits 0..9 as Nd (decimal digit) neither attempts to cover (or preclude) the occasional use of these digits as letters in various orthographies. The General_Category is simply the first-order, most usual categorization of a character.

For more information about the General_Category property, see Chapter 4, Character Properties in the Unicode Standard.

-- Unicode® Standard Annex #44 - Unicode Character Database