1 unstable release
0.1.0 | Jul 3, 2022 |
---|
#480 in Compression
44KB
523 lines
fast-smaz
Pure Rust implementation of smaz algorithm, very close to the original antirez/smaz C implementation with some differences.
Example
use fast_smaz::Smaz;
fn main() {
let my_message = "Hello world!";
let compressed = my_message.smaz_compress();
let decompressed = compressed.smaz_decompress().unwrap();
assert_eq!(my_message.as_bytes(), decompressed);
}
Inspiration
This implementation was inspired in both original antirez/smaz and the peterc/rsmaz, a Ruby implementation of smaz.
Comparison
fast-smaz and antirez/smaz
This implementation is very similar to the original ANSI C implementation of SMAZ, it uses the exact same hash-table and hashing algorithm.
I haven't measured how performance compares, but the ANSI C implementation can do a lot of assumptions both tied to ANSI C specification and correctness of the implementation, while this Rust implementation has some additional checks to ensure the safetyness of the code, which may impact the performance, but should not be that impactful.
fast-smaz and silentsokolov/rust-smaz
silentsokolov/rust-smaz uses lazy_static!
and creates the codebook hash table
at first execution, instead of shipping a pre-generated hash table.
This translates into a slower encoding process (benchmarks on my computer shows 73% slower encoding), not only because of the overhead
of lazy_static!
initialization, but also the overhead of the atomic operation for initialization check and the
hash distribution itself.
I've choosen to keep antirez original bucket-size and distribution because it looked well distributed, and performance results shows that it does have a very good performance.
Note that changing lazy_static!
to OnceLock
does
not help, since both are very similar.
The performance improvement only applies to the encoding process, the decoding process has a similar performance, since both silentsokolov/rust-smaz and fast-smaz share a similar code, which is a simple lookup to the reverse codebook.
Also, fast-smaz uses const
for the codebook and reverse codebook, which allows the compiler to do aggressive optimizations,
this lowers cache-misses and RAM accesses, since const
variables can (and probably will) be inlined at the use-site.
Features
custom-cookbook
: For development only, was used as a testing codebook before the proper translation of the
origional C implementation codebook. It's way more inefficient and with a poor distribution. It was constructed using
the src/hashing.rs
algorithm, I didn't wast too much time improving it and ensuring its correctness, it was only used
to generate a correct and functional codebook for testing. Don't use this.