#algorithm #data #binary-data #substrings #patch #string #comparison

bcmp

bcmp offers fast binary data comparison algorithms to enumerate common substrings, unique substrings or determine a patch set

7 unstable releases (3 breaking)

Uses old Rust 2015

0.4.1 May 21, 2017
0.4.0 May 7, 2017
0.3.2 May 6, 2017
0.3.1 Apr 15, 2017
0.1.0 Apr 1, 2017

#2055 in Algorithms

Download history 147/week @ 2024-01-15 30/week @ 2024-01-22 27/week @ 2024-01-29 35/week @ 2024-02-05 4/week @ 2024-02-12 91/week @ 2024-02-19 60/week @ 2024-02-26 58/week @ 2024-03-04 147/week @ 2024-03-11 33/week @ 2024-03-18

316 downloads per month

MIT license

43KB
769 lines

bcmp

Crates.io Build Status Docs.rs

bcmp is a simple crate which offers data comparison mechanisms which go beyond the simple equality. It only operates on byte slices, hence its name, and relies on efficiently finding common substrings between two blob of data. The implementation relies on two different linear time algorithms: a HashMap based algorithm called HashMatch and a suffix tree built using Ukkonen algorithm called TreeMatch.

Examples

Iterate over the matches between two strings using HashMatch with a minimum match length of 2 bytes:

extern crate bcmp;

use bcmp::{AlgoSpec, MatchIterator};

fn main() {
    let a = "abcdefg";
    let b = "012abc34cdef56efg78abcdefg";
    let match_iter = MatchIterator::new(a.as_bytes(), b.as_bytes(), AlgoSpec::HashMatch(2));
    for m in match_iter {
        println!("Match: {:}", &a[m.first_pos..m.first_end()]);
    }
}

Construct a patch set to build the file b from the file a using TreeMatch with a minimum match length of 4 bytes:

extern crate bcmp;

use std::fs::File;
use std::io::Read;

use bcmp::{AlgoSpec, patch_set};

fn main() {
    let mut a = Vec::<u8>::new();
    let mut b = Vec::<u8>::new();
    File::open("a").unwrap().read_to_end(&mut a);
    File::open("b").unwrap().read_to_end(&mut b);

    let ps = patch_set(&a, &b, AlgoSpec::TreeMatch(4));
    for patch in ps {
        println!("b[0x{:x}..0x{:x}] == a[0x{:x}..0x{:x}]", patch.second_pos, patch.second_end(), patch.first_pos, patch.first_end());
    }
}

Dependencies

~1.5MB
~41K SLoC