16 releases (10 stable)
new 2.3.4 | Apr 13, 2025 |
---|---|
2.3.0 | Jul 15, 2024 |
1.0.2 | Jun 20, 2024 |
0.3.0 | Apr 10, 2024 |
0.2.0 | Dec 27, 2022 |
#227 in Text processing
389 downloads per month
36KB
324 lines
whitespace-sifter
use whitespace_sifter::WhitespaceSifter;
// This prints `1.. 2.. 3.. 4.. 5..`.
println!(
"{}",
"1.. \n2.. \n\r\n\n3.. \n\n\n4.. \n\n\r\n\n\n5.. \n\n\n\n\n".sift(),
);
// This prints `1..\n2..\n3..\n4..\r\n5..`.
println!(
"{}",
"1.. \n2.. \n\r\n3.. \n\n\n4.. \r\n\n\r\n\n5.. \n\n\n\n\n"
.sift_preserve_newlines(),
);
✨ Sift Duplicate Whitespaces In One Function Call
This crate helps you remove duplicate whitespaces within a UTF-8 encoded string
.
It naturally removes the whitespaces at the start and end of the string
.
⚡️Benchmarks
Performance is a priority; Most updates are performance improvements.
The benchmark uses a transcript of the Bee Movie.
Execute these commands to benchmark:
$ git clone https://github.com/JumperBot/whitespace-sifter.git
$ cd whitespace-sifter
$ cargo bench
You should only look for results that look like the following:
Sift/Sift time: [178.69 µs 178.84 µs 179.03 µs]
Sift Preserved/Sift Preserved
time: [179.61 µs 179.75 µs 179.90 µs]
In just 0.0001 seconds; Pretty impressive, no?
Go try it on a better machine, I guess.
Benchmark specifications:- Processor: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz 1.90 GHz
- Memory: RAM 16.0 GB (15.8 GB usable)
- System: GNU/Linux 5.15.153.1-microsoft-standard-WSL2 x86_64
- Modified: v2.3.4
📈 Crate Comparison
Using the same benchmark configuration:
Crate | Dictionary | Features | Time | Complete Sift |
---|---|---|---|---|
whitespace-sifter | ASCII Whitespaces | Preserve Newlines | ~170 µs | ✅ |
collapse | Unicode Whitespaces | ~270 µs | ✅ | |
fast_whitespace_collapse | ASCII Space and Tab | SIMD | ~160 µs | ❌ |
Disclaimers:
1: I do not know the crate maintainers nor asked for permission to include their crates here.
2: As far as I know, there are only three crates dedicated to whitespace sifting/collapse.
3: fast_whitespace_collapse
was not able to collapse cr-lf and line feeds.
Dictionary | Characters |
---|---|
ASCII Whitespaces | '\t' | '\n' | '\x0C' | '\r' | ' ' |
Unicode Whitespaces | ' ' | '\x09'..='\x0d' | unicode::White_Space(c) |
ASCII Space and Tab | ' ' | '\t' |
🔊 Changelog
- Improved Performance
- Minimum Supported Rust Version set to
v1.79.0
(startingv2.3.3
) - Stricter Tests (starting
v2.3.2
)- Proper UTF-8/Unicode Encoding
- Regular Sifting
- Sifting With Leading Whitespaces
- Documentation Assertion
- MSRV Verification
- Crate Comparison (starting
v2.3.4
)
📄 Licensing
whitespace-sifter
is licensed under the MIT LICENSE
; This is the summarization
.