1 unstable release
new 0.1.0 | Feb 20, 2025 |
---|
#433 in Text processing
14KB
133 lines
fast_whitespace_collapse
A high-performance Rust crate for collapsing consecutive spaces and tabs into a single space.
Uses SIMD (u8x16
) via the wide
crate for efficient processing.
Automatically falls back to a scalar implementation if SIMD is unavailable.
Features
- Collapses multiple spaces and tabs into a single space.
- Preserves newlines and non-whitespace characters.
- Uses SIMD (
u8x16
) when supported to process 16 bytes at a time. - Falls back to a fast scalar implementation if SIMD is unavailable.
- Ensures valid UTF-8 output.
- SIMD requires AVX2, SSE2, or NEON instruction sets.
Installation
Add this to your Cargo.toml
:
[dependencies]
fast_whitespace_collapse = "0.1"
Or run the following command:
cargo add fast_whitespace_collapse
Controlling SIMD Support
By default, SIMD acceleration is enabled. You can control it via Cargo features:
πΉ Disable SIMD for Embedded Targets
cargo build --no-default-features
πΉ Explicitly Enable SIMD
cargo build --features simd-optimized
Usage
use fast_whitespace_collapse::collapse_whitespace;
let input = "This is \t a test.";
let output = collapse_whitespace(input);
assert_eq!(output, "This is a test.");
Performance
- Processes text using SIMD (
u8x16
), handling 16 bytes in parallel. - Falls back to scalar processing when SIMD is unavailable.
- Handles large inputs efficiently while maintaining valid UTF-8 output.
Benchmark Results
Comparison with Other Approaches
Method | Time |
---|---|
Regex approach | 11.289 Β΅s |
collapse crate | 1.2624 Β΅s |
Iterative approach | 629.60 ns |
Iterative bytes | 428.00 ns |
fast_whitespace_collapse crate | 388.73 ns |
π fast_whitespace_collapse
outperforms other methods, achieving the lowest execution time.
π Benchmark executed on Apple M1 Pro (NEON SIMD enabled).
πΉ Run Your Own Benchmark
cargo bench
Compatibility
fast_whitespace_collapse supports multiple architectures:
- x86_64: Uses SIMD (
SSE2
,AVX2
) for maximum performance. - ARM (aarch64, M1/M2/M3): Uses NEON SIMD.
- **Other: Falls back to a scalar implementation.
Examples
Basic Usage
use fast_whitespace_collapse::collapse_whitespace;
assert_eq!(collapse_whitespace("Hello world"), "Hello world");
assert_eq!(collapse_whitespace(" Trim spaces " ), "Trim spaces");
assert_eq!(collapse_whitespace("Tabs\t\tconverted"), "Tabs converted");
Unicode Support
assert_eq!(collapse_whitespace("γγγ«γ‘γ― δΈη"), "γγγ«γ‘γ― δΈη"); // Japanese
assert_eq!(collapse_whitespace("δ½ ε₯½ δΈη"), "δ½ ε₯½ δΈη"); // Chinese
assert_eq!(collapse_whitespace("π π π"), "π π π"); // Emojis
Handling Newlines
assert_eq!(collapse_whitespace("Line1\n Line2\nLine3"), "Line1\n Line2\nLine3");
Tests
Run tests with:
cargo test
License
This project is licensed under the MIT License.
Dependencies
~215KB