10 stable releases

1.0.9 Mar 15, 2024
1.0.8 Dec 18, 2018
1.0.7 Aug 24, 2018
1.0.6 Jan 23, 2018
1.0.3 Dec 27, 2016

#383 in Text processing

Download history 415/week @ 2023-12-23 976/week @ 2023-12-30 693/week @ 2024-01-06 606/week @ 2024-01-13 815/week @ 2024-01-20 3434/week @ 2024-01-27 3533/week @ 2024-02-03 4365/week @ 2024-02-10 6417/week @ 2024-02-17 7318/week @ 2024-02-24 6435/week @ 2024-03-02 8385/week @ 2024-03-09 7831/week @ 2024-03-16 9417/week @ 2024-03-23 8959/week @ 2024-03-30 7959/week @ 2024-04-06

35,287 downloads per month
Used in 26 crates (via polars-ops)

MIT/Apache

10KB
63 lines

unicode-reverse

Unicode-aware in-place string reversal for Rust UTF-8 strings.

The reverse_grapheme_clusters_in_place function reverses a string slice in-place without allocating any memory on the heap. It correctly handles multi-byte UTF-8 sequences and grapheme clusters, including combining marks and astral characters such as Emoji.

Example

use unicode_reverse::reverse_grapheme_clusters_in_place;

let mut x = "man\u{0303}ana".to_string();
println!("{}", x); // prints "mañana"

reverse_grapheme_clusters_in_place(&mut x);
println!("{}", x); // prints "anañam"

Background

As described in this article by Mathias Bynens, naively reversing a Unicode string can go wrong in several ways. For example, merely reversing the chars (Unicode Scalar Values) in a string can cause combining marks to become attached to the wrong characters:

let x = "man\u{0303}ana";
println!("{}", x); // prints "mañana"

let y: String = x.chars().rev().collect();
println!("{}", y); // prints "anãnam": Oops! The '~' is now applied to the 'a'.

Reversing the grapheme clusters of the string fixes this problem:

extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;

fn main() {
    let x = "man\u{0303}ana";
    let y: String = x.graphemes(true).rev().collect();
    println!("{}", y); // prints "anañam"
}

The reverse_grapheme_clusters_in_place function from this crate performs this same operation, but performs the reversal in-place rather than allocating a new string.

Note: Even grapheme-level reversal may produce unexpected output if the input string contains certain non-printable control codes, such as directional formatting characters. Handling such characters is outside the scope of this crate.

Algorithm

The implementation is very simple. It makes two passes over the string's contents:

  1. For each grapheme cluster, reverse the bytes within the grapheme cluster in-place.
  2. Reverse the bytes of the entire string in-place.

After the second pass, each grapheme cluster has been reversed twice, so its bytes are now back in their original order, but the clusters are now in the opposite order within the string.

no_std

This crate does not depend on libstd, so it can be used in no_std projects.

Dependencies

~555KB