#string #substring #character #index #replace #indices #extract

substring-replace

This crate provides developer-friendly methods to manipulate strings with character indices

3 releases

0.1.7 Aug 7, 2024
0.1.6 Aug 6, 2024
0.1.3 Jul 31, 2024

#496 in Text processing

Download history 514/week @ 2024-07-29 287/week @ 2024-08-05 7/week @ 2024-08-12

336 downloads per month

GPL-2.0-or-later WITH Bison-exception-2…

17KB
121 lines

mirror crates.io docs.rs

substring-replace: Extract, insert and replace substrings

This crate adds developer-friendly substring methods to easily manipulate string slices in Rust with character indices compatibile with multibyte characters in a similar way to substring() methods in Javascript, Java or C# or substr() in C++ and PHP.

This crate's core substring method has the same signature and functionality as the simpler substring crate, but adds many supplementary methods, such as substring_replace, avoids the need for unsafe blocks and fails gracefully if the start or end index is out of range. However, the two crates should not be added to the same project. If you only need the core substring method and already use the other well-supported crate, do not install this crate. On the other hand, if you need some of extra features available from this crate, uninstall the other crate before installing this one and replace use substring::*; with use substring_replace::*;.

Regular Rust prefers slices to manipulate strings by byte index ranges. However, it panics when byte indices are out of range or fall between character boundaries. Character indices are more intuitive and compatible with the popular Regex crate.

substring

Returns a substring between start and end character indices. These indices differ from byte indices with multibyte characters in the extended Latin-script, most non-Latin alphabets, many special symbols and emojis.

let sample_str = "/long/file/path";
let result = sample_str.substring(5,9);
// the result is "file"

substring_replace

This method removes characters between the specified start and end indices and inserts a replacement string

let new_string = "azdefgh".substring_replace("bc", 1, 2);
println!("{}", new_string);
// will print "abcdefgh"

substring_insert

This method inserts a string at a given character index and differs from the standard String::insert method by using character rather than byte indices to work better with multibyte characters. It also works directly with &str, but returns a new owned string.

let sample_str = "a/c";
let result = sample_str.substring_insert("/b", 1);
// result will be "a/b/c"

substring_start

This will return the start of a string (str or string) until the specified end character index.

let sample_str = "/long/file/path";
let result = sample_str.substring_start(5);
// the result is "/long"

substring_end

This method returns the end of a string (&str or string) from the specified start character index.

let sample_str = "/long/file/path";
let result = sample_str.substring_end(5);
// the result is "/file/path"

substring_replace_start

This method replaces the start of a string to a specified end character index

// remove the first 2 characters and prepend the string "xyz"
let new_string = "abcdefgh".substring_replace_start("xyz", 2);
println!("{}", new_string);
// will print "xyzcdefgh"

substring_replace_end

This method replaces the remainder of string from a specified start character index

// remove all characters after and index of 3 and append the string "xyz"
let new_string = "abcdefgh".substring_replace_end("xyz", 3);
println!("{}", new_string);
// will print "abcxyz"

substring_remove

This method returns the remainder after removing a substring delimited by start and end character indices. It's the oposite to substring(start, end).

let sample_str = "abcdefghij";
let result = sample_str.substring_remove(3, 6);
// result will be "abcfghij"

substring_offset

This method extracts a substring from a start index for n characters to the right or left. A negative length in the second parameter will end at the reference index.

let sample_str = "indian-elephant";
let result = sample_str.substring_offset(7, 3);
// result will be "ele"

substring_pull

This method returns the remainder after removing a substring from a start index for n characters to the right or left. It's the oposite to substring_offset(position, length). As with substring_offset, a negative length in the second parameter will will end at the reference index.

let sample_str = "indian-elephant";
let result = sample_str.substring_offset(7, 3);
// result will be "ele"
let result = sample_str.substring_offset(6, -3);
// result will be "ian"

to_start_byte_index and to_end_byte_index

Theses methods convert either a start character index into a start byte index or an end character index into an end byte index. They're mainly used internally to build a string slice. They differ only in their default value. For to_start_byte_index the default value is 0, while for to_end_byte_index it's the endmost index.

let byte_index = "नमस्ते".to_start_byte_index(2);
// yields byte index of at the start of third multibyte character (character index 2). It should be 6

char_len

This returns the character length in terms of individual unicode symbols as opposed to byte length with str::len(). This is shorthand for &str::char_indices().count().

let emoji = "😎";
println!("Emoji length: {}, emoji byte length: {}", emoji.char_len(), emoji.len() );
// prints: Emoji length: 1, emoji byte length: 4

char_find

This finds the first character index of a plain string pattern. Like the standard find method, it returns an optional unsigned integer (usize). To search from right to left, but still returning the index of the first character in the matched sequence, you can use char_rfind,

let greek_words = "μήλα και πορτοκάλια";
let search_word = "και";
let character_index = greek_words.char_find(search_word);
let byte_index = greek_words.find(search_word);
println!("The word {search_word} starts at a character index of {character_index} and a byte index of {byte_index}");
// The word $search_word starts at a character index of 5 and a byte index of 9

NB: This is an alpha release, but the crate is feature-complete and supplements string-patterns and simple-string-patterns .

Version history

1.3: Added new methods .substring_remove(start: usize, end: usize) and .substring_pull(position: usize, length: i32).

1.5: Added new methods .char_find(pat: &str) and .char_rfind(pat: &str).

No runtime deps