2 releases

Uses old Rust 2015

0.1.1 Jul 23, 2016
0.1.0 Jul 23, 2016

#1633 in Text processing

Download history 194/week @ 2023-12-17 141/week @ 2023-12-24 164/week @ 2023-12-31 145/week @ 2024-01-07 198/week @ 2024-01-14 177/week @ 2024-01-21 167/week @ 2024-01-28 171/week @ 2024-02-04 174/week @ 2024-02-11 179/week @ 2024-02-18 382/week @ 2024-02-25 177/week @ 2024-03-03 148/week @ 2024-03-10 221/week @ 2024-03-17 197/week @ 2024-03-24 335/week @ 2024-03-31

922 downloads per month
Used in 3 crates (via gaoya)

MIT license

15KB
278 lines

shingles.rs

License

Shingles implementation in rust

See docs (0.1 / master)

Overview

Shingles is a crate for constructing shingles ("tokenizing") from slices and utf-8 strings. It was primary created to use in fuzzy matching algorithms like minhash or similar.

Examples

extern crate shingles;

use shingles::AsShingles;

fn main() {
    let v = [1, 2, 3, 4];
    let mut num_sh = v.as_shingles(3);
    let mut str_sh = "привет!".as_shingles_with_step(4, 2);
    
    assert_eq!(Some(&v[0..3]), num_sh.next());
    assert_eq!(Some(&v[1..4]), num_sh.next());
    
    assert_eq!(Some("прив"), str_sh.next());
    assert_eq!(Some("ивет"), str_sh.next());
    
    for h in "привет!".as_shingles(4).hashes() {
        // prints hash for each shingle
        println!("{}", h);
    }
}

2D shingle examples

extern crate shingles;

use shingles::AsShingles2D;

fn main() {
    let v: Vec<_> = "abcd\n\
                     efgh\n\
                     ijkl"
        .split_terminator("\n")
        .collect();

    let mut sh_2d = v.as_shingles_2d([3, 3]);

    assert_eq!(
        Some(vec![&v[0][0..3], &v[1][0..3], &v[2][0..3]]),
        sh_2d.next()
    );

    // You can easily get hashes from 2D-shingles
    for h in v.as_shingles_2d([3, 3]).hashes() {
        // print u64 hash value for each 2D-shingle
        println!("{}", h);
    }
}

No runtime deps