1 unstable release
Uses new Rust 2024
new 0.1.0 | May 16, 2025 |
---|
#812 in Compression
27KB
383 lines
ppmd-core
A pure-Rust implementation of the PPMd (Prediction by Partial Matching, variant D) compressor with an underlying range coder.
Designed for safety (no unsafe
), and zero-dependency entropy coding in your Rust projects.
Features
- PPMd order-N modeling (1 ≤ N ≤ 16)
- Adaptive frequency tables with escape mechanism
- High-speed range encoder/decoder
- Configurable context order: trade CPU/memory vs. compression ratio
- Safe Rust only (
#![forbid(unsafe_code)]
)
Quick Start
File-based API
use ppmd_core::{encode_file, decode_file, PpmResult};
fn main() -> PpmResult<()> {
encode_file("data/input.bin", "data/output.ppmd", None)?;
encode_file("data/input.bin", "data/output_o8.ppmd", Some(8))?;
decode_file("data/output.ppmd", "data/decoded.bin")?;
Ok(())
}
In-memory API
If you prefer streaming or in-memory buffers:
use ppmd_core::{PpmModel, PpmResult};
use std::io::{Cursor, Read, Write};
fn compress_bytes(input: &[u8], order: u8) -> PpmResult<Vec<u8>> {
let mut model = PpmModel::new(order)?;
let mut output = Vec::new();
let mut writer = Cursor::new(&mut output);
// (Optionally write length prefix yourself if needed)
model.encode(&mut Cursor::new(input), &mut writer)?;
Ok(output)
}
fn decompress_bytes(input: &[u8], original_len: usize) -> PpmResult<Vec<u8>> {
use ppmd_core::RangeDecoder;
use std::io::BufWriter;
let mut reader = Cursor::new(input);
let mut decoder = RangeDecoder::new(&mut reader)?;
let mut model = PpmModel::new(5)?; // must match encoder’s order
let mut history = Vec::new();
let mut output = Vec::with_capacity(original_len);
while output.len() < original_len {
let mut byte = [0];
model.decode_symbol(&mut decoder, &mut history, &mut byte)?;
output.push(byte[0]);
}
Ok(output)
}
Tuning and Performance
- Order (context length)
- Low (1–3): very fast, minimal memory, poorer compression
- Medium (4–8): balanced speed vs. ratio (DEFAULT = 5)
- High (9–16): best ratio, more memory, slower
MAX_FREQ
: Controls the maximum per-symbol count in any context (prevents overflow).
You can tweak DEFAULT_ORDER
or call encode_file(..., Some(order))
to experiment.
Benchmark (TODO) your own data with:
cargo bench
License
MIT OR Apache-2.0 (See LICENSE-APACHE and LICENSE-MIT files.)
Dependencies
~225–660KB
~15K SLoC