3 releases
0.1.3 | May 6, 2024 |
---|---|
0.1.2 | Sep 17, 2023 |
0.1.1 |
|
0.1.0 | Aug 22, 2023 |
#219 in Compression
389 downloads per month
Used in 3 crates
(2 directly)
28KB
567 lines
LZJD
Rust implementation of Lempel-Ziv Jaccard Distance (LZJD) algorithm based on jLZJD by Edward Raff.
Main differences:
- Rust instead of Java
- Can use any hasher (executable uses CRC32) instead of just Murmur3
- Does not allocate memory for every unique hash, instead keeps k=1024 smallest
- Based on
Vec<u64>
instead ofIntSetNoRemove
, which is more like HashMap - Hash files are considerably smaller if small sequences have been digested
This fork has minor changes:
- Update to Rust edition 2021.
- Remove dependencies preventing it from working on non-x86 hardware.
USAGE:
lzjd [FLAGS] [OPTIONS] <INPUT>...
FLAGS:
-c, --compare compare SDBFs in file, or two SDBF files
-r, --deep generate SDBFs from directories and files
-g, --gen-compare compare all pairs in source data
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
-o, --output <FILE> send output to files
-t, --threshold <THRESHOLD> only show results >= threshold [default: 1]
ARGS:
<INPUT>... Sets the input file to use
See also:
Dependencies
~3–10MB
~108K SLoC