5 releases (1 stable)

new 1.0.0 May 4, 2021
0.3.0 Mar 13, 2021
0.2.0 Apr 7, 2020
0.1.1 Apr 6, 2020
0.1.0 Apr 2, 2020

#70 in Text processing

Download history 218/week @ 2021-01-13 108/week @ 2021-01-20 220/week @ 2021-01-27 288/week @ 2021-02-03 332/week @ 2021-02-10 176/week @ 2021-02-17 373/week @ 2021-02-24 306/week @ 2021-03-03 256/week @ 2021-03-10 440/week @ 2021-03-17 381/week @ 2021-03-24 184/week @ 2021-03-31 190/week @ 2021-04-07 201/week @ 2021-04-14 225/week @ 2021-04-21 393/week @ 2021-04-28

1,247 downloads per month
Used in broot

MIT license

1MB
66K SLoC

Secular

MIT Latest Version Chat on Miaou

Provide a lowercased diacritics-free version of a character or a string.

For example return e for é.

Secular's char lookup is an inlined lookup of a static table, which means it's possible to use it in performance sensitive code.

Secular also performs (optionally) Unicode normalization.

A common use case for the removal of diacritics and some unicode arterfacts is to ease searches:

broot search

(diacritics ignoring normalized search in broot: the user typed rève)

Declaration

By default, diacritics removal is only done on ascii chars, so to include a smaller table.

If you want to handle the whole BMP, use the "bmp" feature" (the downside is that the binary is bigger as it includes a big map).

Default import:

[dependencies]
secular = "0.3"

For more characters (the BMP):

[dependencies]
secular = { version="0.3", features=["bmp"] }

With Unicode normalization functions (using the unicode-normalization crate):

[dependencies]
secular = { version="0.3", features=["normalization"] }

or

[dependencies]
secular = { version="0.3", features=["bmp","normalization"] }

This feature is optional so that you can avoid importing the unicode-normalization crate (note that it's used in many other crates so it's possible your text processing application already uses it).

Usage

On characters:

use secular::*;
    let s = "Comunicações"; // normalized string (length=12)
    let chars: Vec<char> = s.chars().collect();
    assert_eq!(chars.len(), 12);
    assert_eq!(chars[0], 'C');
    assert_eq!(lower_lay_char(chars[0]), 'c');
    assert_eq!(chars[8], 'ç');
    assert_eq!(lower_lay_char(chars[8]), 'c');

On strings:

use secular::*;
let s = "Comunicações"; // unnormalized string (length=14)
    assert_eq!(s.chars().count(), 14);
    let s = normalized_lower_lay_string(s);
    assert_eq!(s.chars().count(), 12);
    assert_eq!(s, "comunicacoes");

Dependencies