23 releases (stable)
1.12.0 | Sep 13, 2024 |
---|---|
1.11.0 | Feb 7, 2024 |
1.10.1 | Jan 31, 2023 |
1.10.0 | Sep 13, 2022 |
0.1.1 | Jul 9, 2015 |
#6 in Text processing
5,546,685 downloads per month
Used in 14,639 crates
(630 directly)
400KB
5K
SLoC
Iterators which split strings on Grapheme Cluster or Word boundaries, according to the Unicode Standard Annex #29 rules.
use unicode_segmentation::UnicodeSegmentation;
fn main() {
let s = "a̐éö̲\r\n";
let g = s.graphemes(true).collect::<Vec<&str>>();
let b: &[_] = &["a̐", "é", "ö̲", "\r\n"];
assert_eq!(g, b);
let s = "The quick (\"brown\") fox can't jump 32.3 feet, right?";
let w = s.unicode_words().collect::<Vec<&str>>();
let b: &[_] = &["The", "quick", "brown", "fox", "can't", "jump", "32.3", "feet", "right"];
assert_eq!(w, b);
let s = "The quick (\"brown\") fox";
let w = s.split_word_bounds().collect::<Vec<&str>>();
let b: &[_] = &["The", " ", "quick", " ", "(", "\"", "brown", "\"", ")", " ", "fox"];
assert_eq!(w, b);
}
no_std
unicode-segmentation does not depend on libstd, so it can be used in crates
with the #![no_std]
attribute.
crates.io
You can use this package in your project by adding the following
to your Cargo.toml
:
[dependencies]
unicode-segmentation = "1.10.1"
Change Log
1.11.0
1.10.1
1.10.0
1.9.0
- #101 Upgrade to Unicode 14.0.0
1.8.0
- #100 * #100 - Increase
#[inline]
opportunities, resulting in 15-40% performance improvement. - #95 Implement debug for Graphemes
- #94 Add Initial fuzzer for oss-fuzz integration
- #93 Fix unused imports and deprecated pattern warnings
- #91 Made local variable immutable by moving it into loop
- #91 Add new iterator UnicodeWordIndices and unicode_word_indices
1.7.1
- Update docs on version number
1.7.0
- #87 Upgrade to Unicode 13
- #79 Implement a special-case lookup for ascii grapheme categories
- #77 Optimization for grapheme iteration
1.6.0
- #72 Upgrade to Unicode 12
1.5.0
- #68 Upgrade to Unicode 11
1.4.0
- #56 Upgrade to Unicode 10
1.3.0
1.2.1
1.2.0
- New
GraphemeCursor
API allows random access and bidirectional iteration. - Fixed incorrect splitting of certain emoji modifier sequences.
1.1.0
- Add
as_str
methods to the iterator types.
1.0.3
- Code cleanup and additional tests.
1.0.1
- Fix a bug affecting some grapheme clusters containing Prepend characters.
1.0.0
- Upgrade to Unicode 9.0.0.