#compression #lz77 #kindle #format #palmdoc

palmdoc-compression

Fast & safe implementation of PalmDoc/MOBI/AZW/Kindle flavored LZ77

5 releases

0.3.1 Apr 14, 2024
0.3.0 Apr 13, 2024
0.2.3 Apr 10, 2024
0.2.2 Apr 10, 2024
0.2.1 Apr 10, 2024

#555 in Text processing

MIT license

36KB
477 lines

🖐️ palmdoc-compression

docs.rs

This is a fast, safe, and correct implementation of PalmDoc-flavored LZ77 compression (primarily used by Amazon ebook formats). Compression is 300-400x faster than Calibre's implementation with a comparable compression ratio.

This crate also includes Calibre's version for comparison and usage if desired, gated behind the calibre feature.

Usage

use palmdoc_compression::{compress, decompress};

let data = b"hello world";

let compressed = compress(data);
let decompressed = decompress(&compressed).unwrap();

assert_eq!(data, decompressed);

⚡ Benchmarks

MOBI/AZW files are split into 4KB chunks, so benchmarks here also use 4KB chunks. Benchmarks were run on a M1 Max.

For a 4KB chunk of lorem ipsum text:

Decompression Compression
Calibre 922 MiB/s 252 KiB/s
palmdoc-compression 797 MiB/s 109 MiB/s

For a random 4KB chunk of War and Peace from Project Gutenberg:

Decompression Compression
Calibre 1011 MiB/s 336 KiB/s
palmdoc-compression 876 MiB/s 103 MiB/s

(Reproduce with cargo bench --features calibre.)

Compression ratio

Ratios calculated by compressing War and Peace from Project Gutenberg in 4KB chunks.

ratio, ⬇️ is better
calibre 0.56% (theoretical max)
palmdoc-compression 0.57%

(Reproduce with cargo run --example ratios --release --features calibre.)

Credits

  • LPeter1997 for a clear and understandable Rust LZ77 implementation with hash chains
  • Calibre for a reference implementation with tests

Dependencies

~0.3–1MB
~21K SLoC