24 releases (stable)
| 1.13.2 | Mar 26, 2026 |
|---|---|
| 1.12.0 | Sep 13, 2024 |
| 1.11.0 | Feb 7, 2024 |
| 1.10.1 | Jan 31, 2023 |
| 0.1.1 | Jul 9, 2015 |
#22 in Text processing
23,713,765 downloads per month
Used in 32,213 crates
(1,056 directly)
405KB
5K
SLoC
Iterators which split strings on Grapheme Cluster or Word boundaries, according to the Unicode Standard Annex #29 rules.
use unicode_segmentation::UnicodeSegmentation;
fn main() {
let s = "a̐éö̲\r\n";
let g = s.graphemes(true).collect::<Vec<&str>>();
let b: &[_] = &["a̐", "é", "ö̲", "\r\n"];
assert_eq!(g, b);
let s = "The quick (\"brown\") fox can't jump 32.3 feet, right?";
let w = s.unicode_words().collect::<Vec<&str>>();
let b: &[_] = &["The", "quick", "brown", "fox", "can't", "jump", "32.3", "feet", "right"];
assert_eq!(w, b);
let s = "The quick (\"brown\") fox";
let w = s.split_word_bounds().collect::<Vec<&str>>();
let b: &[_] = &["The", " ", "quick", " ", "(", "\"", "brown", "\"", ")", " ", "fox"];
assert_eq!(w, b);
}
no_std
unicode-segmentation does not depend on libstd, so it can be used in crates
with the #![no_std] attribute.
crates.io
You can use this package in your project by adding the following
to your Cargo.toml:
[dependencies]
unicode-segmentation = "1"
Change Log
1.13.2
- #164 Set explicit 1.85 MSRV
- #147 Add ascii fast path for unicode_word_indices and unicode_words
- #157 Support Unicode 17.0.0
1.13.0, 1.13.1
Yanked due to accidental breakage and MSRV mistag.
1.12.0
- #131 Implement Debug on all public structs
- #136 Use stdlib alphabetic and numeric character tables
- #138 Fix arithmetic overflow
- #137 Fix unwrap panic in next_boundary()
- #140 Support Unicode 16.0.0
1.11.0
1.10.1
1.10.0
1.9.0
- #101 Upgrade to Unicode 14.0.0
1.8.0
- #100 * #100 - Increase
#[inline]opportunities, resulting in 15-40% performance improvement. - #95 Implement debug for Graphemes
- #94 Add Initial fuzzer for oss-fuzz integration
- #93 Fix unused imports and deprecated pattern warnings
- #91 Made local variable immutable by moving it into loop
- #91 Add new iterator UnicodeWordIndices and unicode_word_indices
1.7.1
- Update docs on version number
1.7.0
- #87 Upgrade to Unicode 13
- #79 Implement a special-case lookup for ascii grapheme categories
- #77 Optimization for grapheme iteration
1.6.0
- #72 Upgrade to Unicode 12
1.5.0
- #68 Upgrade to Unicode 11
1.4.0
- #56 Upgrade to Unicode 10
1.3.0
1.2.1
1.2.0
- New
GraphemeCursorAPI allows random access and bidirectional iteration. - Fixed incorrect splitting of certain emoji modifier sequences.
1.1.0
- Add
as_strmethods to the iterator types.
1.0.3
- Code cleanup and additional tests.
1.0.1
- Fix a bug affecting some grapheme clusters containing Prepend characters.
1.0.0
- Upgrade to Unicode 9.0.0.