#distance #edit-distance #edit #input-file #lzjd #lempel #ziv

bin+lib malwaredb-lzjd

Rust implementation of the LZJD algorithm by Edward Raff -- https://github.com/EdwardRaff/jLZJD

3 releases

0.1.3 May 6, 2024
0.1.2 Sep 17, 2023
0.1.1 Sep 17, 2023
0.1.0 Aug 22, 2023

#130 in Compression

Download history 220/week @ 2024-01-25 261/week @ 2024-02-01 211/week @ 2024-02-08 647/week @ 2024-02-15 316/week @ 2024-02-22 223/week @ 2024-02-29 256/week @ 2024-03-07 136/week @ 2024-03-14 307/week @ 2024-03-21 135/week @ 2024-03-28 233/week @ 2024-04-04 159/week @ 2024-04-11 342/week @ 2024-04-18 401/week @ 2024-04-25 477/week @ 2024-05-02 214/week @ 2024-05-09

1,473 downloads per month
Used in 3 crates (2 directly)

GPL-3.0 license

28KB
567 lines

TestLintCrates.io Version

LZJD

Documentation

Rust implementation of Lempel-Ziv Jaccard Distance (LZJD) algorithm based on jLZJD by Edward Raff.

Main differences:

  • Rust instead of Java
  • Can use any hasher (executable uses CRC32) instead of just Murmur3
  • Does not allocate memory for every unique hash, instead keeps k=1024 smallest
  • Based on Vec<u64> instead of IntSetNoRemove, which is more like HashMap
  • Hash files are considerably smaller if small sequences have been digested

This fork has minor changes:

  • Update to Rust edition 2021.
  • Remove dependencies preventing it from working on non-x86 hardware.
USAGE:
    lzjd [FLAGS] [OPTIONS] <INPUT>...

FLAGS:
    -c, --compare        compare SDBFs in file, or two SDBF files
    -r, --deep           generate SDBFs from directories and files
    -g, --gen-compare    compare all pairs in source data
    -h, --help           Prints help information
    -V, --version        Prints version information

OPTIONS:
    -o, --output <FILE>            send output to files
    -t, --threshold <THRESHOLD>    only show results >= threshold [default: 1]

ARGS:
    <INPUT>...    Sets the input file to use

See also:

Dependencies

~3–11MB
~121K SLoC