12 stable releases
new 1.5.0 | Dec 8, 2024 |
---|---|
1.4.0 | Jul 21, 2023 |
1.3.2 | Jun 2, 2023 |
1.3.1 | Sep 17, 2022 |
1.0.0 | Aug 8, 2020 |
#74 in Text processing
1,046 downloads per month
Used in spongebobizer
220KB
5K
SLoC
focaccia
Unicode case folding methods for case-insensitive string comparisons. Used to
implement case folding operations on the Symbol
and String
classes in
the Ruby Core implementation in Artichoke Ruby.
Focaccia supports full, ASCII, and Turkic Unicode case folding equality and ordering comparisons.
One of the most common things that software developers do is "normalize" text for the purposes of comparison. And one of the most basic ways that developers are taught to normalize text for comparison is to compare it in a "case insensitive" fashion. In other cases, developers want to compare strings in a case sensitive manner. Unicode defines upper, lower, and title case properties for characters, plus special cases that impact specific language's use of text. (W3C, Case Folding)
focaccia is a flat Italian bread. The focaccia crate compares UTF-8 strings by flattening them to folded downcase. Artichoke goes well with focaccia.
Usage
Add this to your Cargo.toml
:
[dependencies]
focaccia = "1.5.0"
Then make case insensitive string comparisons like:
use core::cmp::Ordering;
use focaccia::CaseFold;
let fold = CaseFold::Full;
assert_eq!(fold.casecmp("MASSE", "Maße"), Ordering::Equal);
assert_eq!(fold.casecmp("São Paulo", "Sao Paulo"), Ordering::Greater);
assert!(fold.case_eq("MASSE", "Maße"));
assert!(!fold.case_eq("São Paulo", "Sao Paulo"));
For text known to be ASCII, Focaccia can make a more performant comparison check:
use core::cmp::Ordering;
use focaccia::CaseFold;
let fold = CaseFold::Ascii;
assert_eq!(fold.casecmp("Crate: focaccia", "Crate: FOCACCIA"), Ordering::Equal);
assert_eq!(fold.casecmp("Fabled", "failed"), Ordering::Less);
assert!(fold.case_eq("Crate: focaccia", "Crate: FOCACCIA"));
assert!(!fold.case_eq("Fabled", "failed"));
ASCII case comparison can be checked on a byte slice:
use core::cmp::Ordering;
use focaccia::{ascii_casecmp, ascii_case_eq};
assert_eq!(ascii_casecmp(b"Artichoke Ruby", b"artichoke ruby"), Ordering::Equal);
assert!(ascii_case_eq(b"Artichoke Ruby", b"artichoke ruby"));
Turkic case folding is similar to full case folding with additional mappings for dotted and dotless I:
use core::cmp::Ordering;
use focaccia::CaseFold;
let fold = CaseFold::Turkic;
assert_eq!(fold.casecmp("İstanbul", "istanbul"), Ordering::Equal);
assert_ne!(fold.casecmp("İstanbul", "Istanbul"), Ordering::Equal);
assert!(fold.case_eq("İstanbul", "istanbul"));
assert!(!fold.case_eq("İstanbul", "Istanbul"));
Implementation
Focaccia generates conversion tables from Unicode Data Files. Focaccia
implements case folding as defined in the Unicode standard (see
CaseFolding.txt
).
no_std
Focaccia is no_std
compatible with an optional and enabled by default
dependency on std
. Focaccia does not link to alloc
in its no_std
configuration.
Crate features
All features are enabled by default.
- std - Enable linking to the Rust Standard Library. Enabling this feature
adds
Error
implementations to error types in this crate.
Minimum Supported Rust Version
This crate requires at least Rust 1.76.0. This version can be bumped in minor releases.
Unicode Version
Focaccia implements Unicode case folding with the Unicode 16.0.0 case folding ruleset.
Each new release of Unicode may bring updates to the CaseFolding.txt
which is
the source for the folding mappings in this crate. Updates to the case folding
rules will be accompanied with a minor version bump.
License
focaccia
is licensed under the MIT License (c) Ryan Lopopolo.
focaccia
includes Unicode Data Files which are subject to the Unicode Terms
of Use and Unicode License v3 (c) 1991-2024 Unicode, Inc.
Generated files in this repository are marked with // @generated
comments and
the Unicode copyright. These generated files incorporate data derived from the
Unicode Data Files. More details about the generation process can be found in
scripts/gen_case_lookups.rb
. The generated sources created by this script
are subject to both the MIT License contained in this repository and the Unicode
License v3.