#nlp #numbers #language #english #spanish #digits #french

text2num

Parse and convert numbers written in English, Dutch, Spanish, German, Italian or French into their digit representation

14 stable releases

2.5.1 Aug 29, 2024
2.4.1 Jul 18, 2024
2.1.4 Aug 29, 2023
2.1.3 Jun 15, 2023
1.7.0 Dec 6, 2021

#196 in Text processing

Download history 16/week @ 2024-05-26 8/week @ 2024-06-02 138/week @ 2024-06-09 12/week @ 2024-06-16 111/week @ 2024-06-23 16/week @ 2024-06-30 102/week @ 2024-07-07 166/week @ 2024-07-14 28/week @ 2024-07-21 1/week @ 2024-07-28 126/week @ 2024-08-18 144/week @ 2024-08-25 18/week @ 2024-09-01

288 downloads per month

MIT license

170KB
4K SLoC

Parse and convert numbers written in English, Dutch, Spanish, German, Italian or French into their digit representation.

This crate provides a library for recognizing, parsing and transcribing into digits (base 10) numbers expressed in natural language.

Examples

check some string is a valid number in a given language

use text2num::{Language, text2digits};

let es = Language::spanish();
let utterance = "ochenta y cinco";

match text2digits(utterance, &es) {
    Ok(repr) => println!("'{}' means {} in Spanish", utterance, repr),
    Err(_) => println!("'{}' is not a number in Spanish", utterance)
}

When run, the above code should print 'ochenta y cinco' means 85 in Spanish on the standard output.

find and replace numbers in a natural speech string

Most often, you just want to rewrite a string containing natural speech so that the numbers it contains (cardinals, ordinals, decimal numbers) appear in digit (base 10) form instead.

As isolated smaller numbers may be easier to read in plain text, you can specify a threshold under which isolated simple cardinals and ordinals are not replaced.

use text2num::{Language, replace_numbers};

let en = Language::english();

let sentence = "Let me show you two things: first, isolated numbers are treated differently than groups like one, two, three. And then, that decimal numbers like three point one four one five are well understood.";

assert_eq!(
    replace_numbers(sentence, &en, 10.0),
    "Let me show you two things: first, isolated numbers are treated differently than groups like 1, 2, 3. And then, that decimal numbers like 3.1415 are well understood."
);

assert_eq!(
    replace_numbers(sentence, &en, 0.0),
    "Let me show you 2 things: 1st, isolated numbers are treated differently than groups like 1, 2, 3. And then, that decimal numbers like 3.1415 are well understood."
);

For more advances usages (e.g. on token streams), see the documentation.

Dependencies

~1.8–2.4MB
~45K SLoC