#text-search #simd #search-algorithms #search #string-search #text #string

sliceslice

A fast implementation of single-pattern substring search using SIMD acceleration

7 releases

0.4.2 Jan 30, 2024
0.4.1 Apr 27, 2022
0.3.1 May 18, 2021
0.2.1 Sep 24, 2020
0.1.0 Jul 22, 2020

#189 in Text processing

Download history 1678/week @ 2023-12-15 1296/week @ 2023-12-22 884/week @ 2023-12-29 1666/week @ 2024-01-05 1590/week @ 2024-01-12 1297/week @ 2024-01-19 1797/week @ 2024-01-26 1684/week @ 2024-02-02 1505/week @ 2024-02-09 1584/week @ 2024-02-16 1246/week @ 2024-02-23 1413/week @ 2024-03-01 2589/week @ 2024-03-08 4075/week @ 2024-03-15 3555/week @ 2024-03-22 2136/week @ 2024-03-29

12,491 downloads per month

MIT license

255KB
1.5K SLoC

sliceslice

Actions Crate Docs License

A fast implementation of single-pattern substring search using SIMD acceleration, based on the work presented by Wojciech Muła. For a fast multi-pattern substring search algorithm, see instead the aho-corasick crate.

Example

use sliceslice::x86::DynamicAvx2Searcher;

fn main() {
    let searcher = unsafe { DynamicAvx2Searcher::new(b"ipsum".to_owned().into()) };

    assert!(unsafe {
        searcher.search_in(b"Lorem ipsum dolor sit amet, consectetur adipiscing elit")
    });

    assert!(!unsafe {
        searcher.search_in(b"foo bar baz qux quux quuz corge grault garply waldo fred")
    });
}

Benchmarks

We ran the i386 benchmarks on an HP EliteDesk 800 G2 Tower PC with an Intel Core i7-6700 Processor @ 3.40GHz, 16GB of RAM and 512GB of disk space, running Ubuntu 20.04.1 LTS, gcc 9.3.0 and Rust 1.46.0.

Library Version Function Short haystack Long haystack
std 1.46.0 String::find [335.32 ms 335.56 ms 335.83 ms] [344.62 ms 345.01 ms 345.52 ms]
memmem 0.1.1 TwoWaySearcher::search_in [87.927 ms 88.029 ms 88.151 ms] [401.40 ms 401.59 ms 401.81 ms]
twoway 0.2.1 find_bytes [274.60 ms 274.82 ms 275.07 ms] [146.32 ms 146.44 ms 146.58 ms]
sse4-strstr¹ 0.0.0-9308a59 avx2_strstr_v2 [75.389 ms 75.515 ms 75.682 ms] [38.521 ms 38.579 ms 38.649 ms]
sliceslice 0.2.0 DynamicAvx2Searcher::search_in [79.283 ms 79.416 ms 79.596 ms] [35.135 ms 35.181 ms 35.247 ms]

¹ sse4-strstr is not memory safe as it can read beyond the end of the haystack, please see the documentation for sliceslice where this is discussed in more detail.

Benchmarks results column chart

Licensing

Licensed under the MIT license. See the LICENSE file for details.

Dependencies

~1.2–1.7MB
~38K SLoC