4 releases
Uses new Rust 2024
| 0.2.1 | Feb 14, 2026 |
|---|---|
| 0.2.0 | Jul 10, 2025 |
| 0.1.1 | Jul 6, 2025 |
| 0.1.0 | Jul 6, 2025 |
#1003 in Text processing
14,141 downloads per month
Used in rlibphonenumber
230KB
96 lines
Extended Decimal
A tiny, zero-cost Rust library to correctly parse any Unicode decimal digit.
Ever needed to parse a number from a string, but it might contain digits from other languages like ९ (Devanagari nine) or ٣ (Arabic-Indic three)? The standard char::to_digit in Rust only handles ASCII digits well. This crate extends that power to all Unicode characters in the "Decimal Number (Nd)" category.
Features
- Blazing Fast: All Unicode mappings are resolved at compile-time into a highly efficient
matchstatement. This means converting a character at runtime is a zero-cost abstraction with no overhead. - Simple API: Provides a straightforward extension trait,
DecimalExtended, for thechartype. If you know how to use Rust, you already know how to use this. - Self-Contained: The necessary Unicode data is bundled into the crate, so you don't need to worry about external files or runtime downloads.
- Comprehensive: Correctly identifies and converts all decimal digits across various scripts as defined by the Unicode Standard.
Quick Start
-
Add
dec_from_charto yourCargo.toml:[dependencies] dec_from_char = "0.2.0" # Replace with the latest version -
Use the
DecimalExtendedtrait to convert characters.use dec_from_char::DecimalExtended; fn main() { // Works for common ASCII digits assert_eq!('7'.to_decimal_utf8(), Some(7)); // And for a wide range of other Unicode digits! assert_eq!('९'.to_decimal_utf8(), Some(9)); // Devanagari assert_eq!('०'.to_decimal_utf8(), Some(0)); // Devanagari assert_eq!('7'.to_decimal_utf8(), Some(7)); // Fullwidth assert_eq!('٣'.to_decimal_utf8(), Some(3)); // Extended Arabic-Indic // It gracefully returns None for non-digit characters assert_eq!('a'.to_decimal_utf8(), None); assert_eq!('🎉'.to_decimal_utf8(), None); // Normalization assert_eq!('٣'.normalize_decimal(), Some('3')); assert_eq!('7'.normalize_decimal(), Some('7')); assert_eq!('🎉'.normalize_decimal(), None); }
Example: Parsing Numbers from a Mixed-Script String
This crate makes it trivial to extract numbers from text, no matter how they are formatted.
use dec_from_char::DecimalExtended;
let messy_string = "Phone number: (0)𝟗𝟖-𝟳𝟲𝟱 and pin: ٣-١-٤-١";
let digits: String = messy_string.chars()
.filter_map(|c| c.normalize_decimal()) // Convert each char to a digit if possible
.collect();
assert_eq!(digits, "0987653141");
// you can do the same with `normalize_decimals_filtering`
assert_eq!(normalize_decimals_filtering(messy_string) "0987653141");
// or you can normalize digits keeping rest chars
assert_eq!(normalize_decimals(messy_string), "Phone number: (0)98-765 and pin: 3-1-4-1");
println!("Extracted digits: {}", digits); // "0987653141"
How It Works
This crate contains two main parts:
- A procedural macro that reads the official
UnicodeData.txtfile at compile time. - An extension trait that uses the code generated by this macro.
When you compile your project, the macro scans the Unicode data file for every character that is a decimal digit (category Nd). It then generates a massive, but hyper-efficient, match statement that maps each of these characters to its u8 value (0-9).
This generated code is then compiled directly into your binary. The result? At runtime, calling .to_decimal_utf8() is as fast as it gets, with no searching, parsing, or hashmaps involved.
API
The crate exposes a single trait:
pub trait DecimalExtended
fn to_decimal_utf8(&self) -> Option<u8>: Converts any decimal Unicode digit in theNdcategory to au8. ReturnsNoneif the character is not a decimal digit.fn is_decimal_utf8(&self) -> bool: A convenience method that returnstrueif the character is a decimal digit.
License
This project is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.
Contributing
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
Dependencies
~1.4–2.2MB
~32K SLoC