7 releases

new 0.0.7 Dec 8, 2024
0.0.6 Dec 8, 2024
0.0.5 Jun 2, 2023
0.0.4 Oct 1, 2022
0.0.2 Feb 15, 2021

#766 in Text processing

Download history 26/week @ 2024-09-23

364 downloads per month

MIT AND Unicode-3.0

370KB
3.5K SLoC

roe

GitHub Actions Code Coverage Discord Twitter
Crate API API trunk

Implements Unicode case mapping for conventionally UTF-8 binary strings.

Case mapping or case conversion is a process whereby strings are converted to a particular form—uppercase, lowercase, or titlecase—possibly for display to the user.

roe can convert conventionally UTF-8 binary strings to capitalized, lowercase, and uppercase forms. This crate is used to implement String#capitalize, Symbol#capitalize, String#downcase, Symbol#downcase, String#upcase, and Symbol#upcase in Artichoke Ruby.

This crate depends on bstr.

Implementation

Roe generates conversion tables from Unicode Data Files. Roe implements case mapping as defined in the Unicode standard (see PropList.txt, SpecialCasing.txt, UnicodeData.txt).

Status

This crate is currently a work in progress. When the API is complete, Roe will support lowercase, uppercase, titlecase, and case folding iterators for conventionally UTF-8 byte slices.

Roe will implement support for full, Turkic, ASCII, and case folding transforms.

Usage

Add this to your Cargo.toml:

[dependencies]
roe = "0.0.7"

Then convert case like:

use roe::{LowercaseMode, UppercaseMode, TitlecaseMode};

assert_eq!(
    roe::lowercase(b"Artichoke Ruby", LowercaseMode::Ascii).collect::<Vec<_>>(),
    b"artichoke ruby"
);
assert_eq!(
    roe::uppercase("Αύριο".as_bytes(), UppercaseMode::Full).collect::<Vec<_>>(),
    "ΑΎΡΙΟ".as_bytes()
);
assert_eq!(
    roe::titlecase("".as_bytes(), TitlecaseMode::Full).collect::<Vec<_>>(),
    "Ffi".as_bytes()
);

Crate Features

roe is no_std compatible with an optional dependency on the alloc crate.

roe has several Cargo features, all of which are enabled by default:

  • std - Adds a dependency on std, the Rust Standard Library. This feature enables std::error::Error implementations on error types in this crate. Enabling the std feature also enables the alloc feature.
  • alloc - Adds a dependency on alloc, the Rust allocation and collections library. This feature enables APIs that allocate String or Vec.

Unicode Version

Roe implements Unicode case mapping with the Unicode 16.0.0 case mapping ruleset.

Each new release of Unicode may bring updates to the Data Files which are the source for the case mappings in this crate. Updates to the case mapping rules will be accompanied with a minor version bump.

License

roe is licensed under the MIT License (c) Ryan Lopopolo.

roe includes Unicode Data Files which are subject to the Unicode Terms of Use and Unicode License v3 (c) 1991-2024 Unicode, Inc.

Dependencies

~465–620KB