6 releases (breaking)
| new 0.6.0 | Feb 8, 2026 |
|---|---|
| 0.5.0 | Feb 4, 2026 |
| 0.4.0 | Feb 4, 2026 |
| 0.3.1 | Feb 1, 2026 |
| 0.1.0 |
|
#1466 in Images
42 downloads per month
3MB
59K
SLoC
zenjpeg
A pure Rust JPEG encoder and decoder with perceptual optimizations.
Important: The decoder requires the
decoderfeature flag:[dependencies] zenjpeg = { version = "0.6", features = ["decoder"] }See Feature Flags for details.
Note: This crate was previously published as
jpegli-rs. If migrating, update your imports fromuse jpegli::touse zenjpeg::.
Heritage and Divergence
This project started as a port of jpegli, Google's improved JPEG encoder from the JPEG XL project. After six rewrites it has diverged significantly into an independent project.
Ideas adopted from jpegli:
- Adaptive quantization (content-aware bit allocation)
- XYB color space with ICC profiles (progressive mode recommended for best compression)
- Perceptually-tuned quantization tables
- Zero-bias strategies for coefficient rounding
Ideas adopted from mozjpeg:
- Overshoot deringing for documents/graphics
- Trellis quantization for optimal coefficient selection
- Hybrid trellis mode (experimental, see Trellis Modes below)
Where we went our own way:
- Pure Rust,
#![forbid(unsafe_code)]unconditionally (SIMD via safe archmage tokens) - Streaming encoder API for memory efficiency (process images row-by-row)
- Portable SIMD via
widecrate instead of platform intrinsics - Parallel encoding support
- UltraHDR support (HDR gain maps for backward-compatible HDR JPEGs)
- Independent optimizations and bug fixes
Features
- Pure Rust - No C/C++ dependencies, builds anywhere Rust does
- Perceptual optimization - Adaptive quantization for better visual quality at smaller sizes
- Trellis quantization - Optimal coefficient selection from mozjpeg
- Overshoot deringing - Eliminates ringing artifacts on documents and graphics (enabled by default)
- Backward compatible - Produces standard JPEG files readable by any decoder
- SIMD accelerated - Portable SIMD via
widecrate - Streaming API - Memory-efficient row-by-row encoding for large images
- Parallel encoding - Multi-threaded for large images (1024x1024+)
- UltraHDR support - Encode/decode HDR gain maps (optional
ultrahdrfeature) - Color management - Optional ICC profile support
Known Limitations
- XYB color space - With progressive mode, matches or beats C++ jpegli file sizes. Baseline mode is 2-3% larger.
- XYB decoder speed - XYB images use f32 pipeline; standard JPEG decoding uses fast integer IDCT.
Trellis Modes
zenjpeg supports three quantization modes:
Standard (jpegli-style)
Default mode. Uses adaptive quantization with perceptual zero-bias. Good balance of speed and quality.
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter);
Standalone Trellis (mozjpeg-style)
Rate-distortion optimized coefficient selection. Typically 10-15% smaller files at equivalent quality. Slightly slower due to dynamic programming optimization.
use zenjpeg::encode::{ExpertConfig, OptimizationPreset, ColorMode, ChromaSubsampling};
let expert = ExpertConfig::from_preset(OptimizationPreset::MozjpegBaseline, 85);
let config = expert.to_encoder_config(ColorMode::YCbCr {
subsampling: ChromaSubsampling::Quarter,
});
Hybrid Trellis (recommended)
Combines jpegli's adaptive quantization with mozjpeg's trellis. This is our best mode
and is enabled via .auto_optimize(true):
- +1.5 SSIM2 points vs jpegli at matched file size
- -1.5% to -2% smaller files at matched quality
- Works across q50-q95 range
use zenjpeg::encoder::{EncoderConfig, ChromaSubsampling};
// Recommended: use auto_optimize for best results
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter)
.auto_optimize(true);
Quick Start
Encode
use zenjpeg::encoder::{EncoderConfig, PixelLayout, ChromaSubsampling, Unstoppable};
// Best quality/size with auto_optimize
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter)
.auto_optimize(true);
let mut enc = config.encode_from_bytes(width, height, PixelLayout::Rgb8Srgb)?;
enc.push_packed(&rgb_bytes, Unstoppable)?;
let jpeg_bytes: Vec<u8> = enc.finish()?;
Decode
Requires features = ["decoder"] (prerelease API).
use zenjpeg::decoder::Decoder;
use enough::Unstoppable;
let result = Decoder::new().decode(&jpeg_bytes, Unstoppable)?;
let rgb_pixels: &[u8] = result.pixels_u8().expect("u8 output");
let (width, height) = result.dimensions();
Resource Limits and Cancellation
Resource Limits (DoS Protection)
Protect against malicious images that could exhaust memory or CPU:
use zenjpeg::decoder::Decoder;
use zenjpeg::types::Limits;
// Set limits individually
let decoder = Decoder::new()
.max_pixels(100_000_000) // 100 megapixels max
.max_memory(512_000_000); // 512 MB max allocation
// Or use Limits struct
let limits = Limits {
max_pixels: Some(100_000_000),
max_memory: Some(512_000_000),
max_output: None,
};
let decoder = Decoder::new().limits(limits);
Default limits:
max_pixels: 100 megapixelsmax_memory: 512 MB
Set to 0 or None for unlimited (not recommended for untrusted input).
Cooperative Cancellation
Use Stop tokens for graceful shutdown in long-running operations:
use enough::{Stop, Unstoppable};
use std::sync::Arc;
use std::sync::atomic::{AtomicBool, Ordering};
// Simple case: never cancel
let image = Decoder::new().decode(&jpeg_data, Unstoppable)?;
// Custom stop token (e.g., user clicked cancel button)
struct CancelToken {
cancelled: Arc<AtomicBool>,
}
impl Stop for CancelToken {
fn should_stop(&self) -> bool {
self.cancelled.load(Ordering::Relaxed)
}
}
let cancel = CancelToken {
cancelled: Arc::new(AtomicBool::new(false)),
};
// Decode with cancellation support
let result = Decoder::new().decode(&jpeg_data, &cancel);
// In another thread: cancel.cancelled.store(true, Ordering::Relaxed);
Encoder cancellation:
let mut encoder = config.encode_from_bytes(width, height, layout)?;
encoder.push_packed(&pixels, &cancel_token)?; // Can be cancelled during push
let jpeg = encoder.finish()?;
Per-Image Metadata (Three-Layer Pattern)
For encoding multiple images with the same config but different metadata:
use zenjpeg::encoder::{EncoderConfig, ChromaSubsampling, Exif, Orientation};
// Layer 1: Reusable config (quality, color mode, optimization settings)
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter)
.auto_optimize(true)
.progressive(true);
// Layer 2: Per-image request (metadata, limits, stop token)
// Image 1: sRGB with orientation
let jpeg1 = config.request()
.icc_profile(&srgb_icc_bytes)
.exif(Exif::build().orientation(Orientation::Rotate90))
.encode(&pixels1, 1920, 1080)?;
// Image 2: Display P3 with different metadata
let jpeg2 = config.request()
.icc_profile(&p3_icc_bytes)
.exif(Exif::build().copyright("© 2024 Example Corp"))
.encode(&pixels2, 3840, 2160)?;
// Image 3: No metadata, with cancellation
let jpeg3 = config.request()
.stop(&cancel_token)
.encode(&pixels3, 800, 600)?;
Why three layers?
- EncoderConfig - Reusable settings (quality, color mode, progressive)
- EncodeRequest - Per-image data (ICC profile, EXIF, XMP, limits, stop token)
- Encoder - Streaming execution (push rows, finish)
Request builder methods:
.icc_profile(&[u8])- Borrowed ICC profile.icc_profile_owned(Vec<u8>)- Owned ICC profile.exif(Exif)- EXIF metadata.xmp(&[u8])/.xmp_owned(Vec<u8>)- XMP metadata.stop(&dyn Stop)- Cancellation token.limits(Limits)- Resource limits (encoder future feature)
Streaming with request:
let mut encoder = config.request()
.icc_profile(&srgb_bytes)
.encode_from_rgb::<rgb::RGB<u8>>(1920, 1080)?;
encoder.push_packed(&pixels, Unstoppable)?;
let jpeg = encoder.finish()?;
API Reference
Encoder API
All encoder types are in zenjpeg::encoder:
use zenjpeg::encoder::{
EncoderConfig, PixelLayout, Quality, ChromaSubsampling, Unstoppable
};
Quick Start
use zenjpeg::encoder::{EncoderConfig, PixelLayout, ChromaSubsampling, Unstoppable};
// Create reusable config (quality and color mode set in constructor)
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter)
.progressive(true);
// Encode from raw bytes
let mut enc = config.encode_from_bytes(1920, 1080, PixelLayout::Rgb8Srgb)?;
enc.push_packed(&rgb_bytes, Unstoppable)?;
let jpeg = enc.finish()?;
Three Encoder Entry Points
| Method | Input Type | Use Case |
|---|---|---|
encode_from_bytes(w, h, layout) |
&[u8] |
Raw byte buffers |
encode_from_rgb::<P>(w, h) |
rgb crate types |
RGB<u8>, RGBA<f32>, etc. |
encode_from_ycbcr_planar(w, h) |
YCbCrPlanes |
Video decoder output |
Examples
use zenjpeg::encoder::{EncoderConfig, PixelLayout, ChromaSubsampling, Unstoppable};
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter);
// From raw RGB bytes
let mut enc = config.encode_from_bytes(800, 600, PixelLayout::Rgb8Srgb)?;
enc.push_packed(&rgb_bytes, Unstoppable)?;
let jpeg = enc.finish()?;
// From rgb crate types
use rgb::RGB;
let mut enc = config.encode_from_rgb::<RGB<u8>>(800, 600)?;
enc.push_packed(&pixels, Unstoppable)?;
let jpeg = enc.finish()?;
// From planar YCbCr (video pipelines)
let mut enc = config.encode_from_ycbcr_planar(1920, 1080)?;
enc.push(&planes, num_rows, Unstoppable)?;
let jpeg = enc.finish()?;
EncoderConfig Constructors
Choose one constructor based on desired color mode:
| Constructor | Color Mode | Use Case |
|---|---|---|
EncoderConfig::ycbcr(q, sub) |
YCbCr | Standard JPEG (most compatible) |
EncoderConfig::xyb(q, b_sub) |
XYB | Perceptual color space (better quality) |
EncoderConfig::grayscale(q) |
Grayscale | Single-channel output |
Builder Methods
| Method | Description | Default |
|---|---|---|
.auto_optimize(bool) |
Best quality/size - enables hybrid trellis λ=14.5 | false |
.progressive(bool) |
Progressive JPEG (3-7% smaller) | true |
.huffman(impl Into<HuffmanStrategy>) |
Huffman table strategy | Optimize |
.deringing(bool) |
Overshoot deringing for documents/graphics | true |
.sharp_yuv(bool) |
SharpYUV downsampling | false |
.separate_chroma_tables(bool) |
Use 3 quant tables (Y, Cb, Cr) vs 2 (Y, shared) | true |
.icc_profile(bytes) |
Attach ICC profile | None |
.exif(exif) |
Embed EXIF metadata | None |
.xmp(data) |
Embed XMP metadata | None |
.restart_interval(n) |
MCUs between restart markers | 0 |
Quality Options
use zenjpeg::encoder::{EncoderConfig, Quality, ChromaSubsampling};
// Simple quality scale (0-100)
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter);
// Quality enum variants
let config = EncoderConfig::ycbcr(
Quality::ApproxJpegli(85.0), // Default scale
ChromaSubsampling::Quarter
);
// Or: Quality::ApproxMozjpeg(80) - Match mozjpeg output
// Or: Quality::ApproxSsim2(90.0) - Target SSIMULACRA2 score
// Or: Quality::ApproxButteraugli(1.0) - Target butteraugli distance
Pixel Layouts
| Layout | Bytes/px | Notes |
|---|---|---|
Rgb8Srgb |
3 | Default, sRGB gamma |
Bgr8Srgb |
3 | Windows/GDI order |
Rgba8Srgb / Rgbx8Srgb |
4 | Alpha/pad ignored |
Bgra8Srgb / Bgrx8Srgb |
4 | BGR + alpha/pad ignored |
Gray8Srgb |
1 | Grayscale sRGB |
Rgb16Linear / Rgba16Linear |
6/8 | 16-bit linear |
RgbF32Linear / RgbaF32Linear |
12/16 | HDR float (0.0-1.0) |
YCbCr8 / YCbCrF32 |
3/12 | Pre-converted YCbCr |
Chroma Subsampling
use zenjpeg::encoder::{EncoderConfig, ChromaSubsampling, XybSubsampling};
// YCbCr subsampling
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter); // 4:2:0 (best compression)
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::None); // 4:4:4 (best quality)
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::HalfHorizontal); // 4:2:2
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::HalfVertical); // 4:4:0
// XYB B-channel subsampling
let config = EncoderConfig::xyb(85, XybSubsampling::BQuarter); // B at 4:2:0
let config = EncoderConfig::xyb(85, XybSubsampling::Full); // No subsampling
Resource Estimation
use zenjpeg::encoder::{EncoderConfig, ChromaSubsampling};
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter);
// Typical memory estimate
let estimate = config.estimate_memory(1920, 1080);
// Guaranteed upper bound (for resource reservation)
let ceiling = config.estimate_memory_ceiling(1920, 1080);
Decoder API
Prerelease: The decoder API is behind the
decoderfeature flag and will have breaking changes. Enable withzenjpeg = { version = "...", features = ["decoder"] }.
All decoder types are in zenjpeg::decoder:
use zenjpeg::decoder::{Decoder, DecodeResult};
Basic Decoding
use zenjpeg::decoder::Decoder;
use enough::Unstoppable;
// Decode to u8 RGB (default)
let result = Decoder::new().decode(&jpeg_data, Unstoppable)?;
let pixels: &[u8] = result.pixels_u8().expect("u8 output");
let (width, height) = result.dimensions();
High-Precision Decoding (f32)
Use OutputTarget for f32 output with different transfer functions:
use zenjpeg::decoder::{Decoder, OutputTarget};
use enough::Unstoppable;
// sRGB gamma-encoded f32 (0.0-1.0 range)
let result = Decoder::new()
.output_target(OutputTarget::SrgbF32)
.decode(&jpeg_data, Unstoppable)?;
let pixels: &[f32] = result.pixels_f32().expect("f32 output");
// Linear light f32 (for compositing, HDR)
let result = Decoder::new()
.output_target(OutputTarget::LinearF32)
.decode(&jpeg_data, Unstoppable)?;
// Convert f32 to u8 or u16 when needed
let u8_pixels: Option<Vec<u8>> = result.to_u8();
let u16_pixels: Option<Vec<u16>> = result.to_u16();
YCbCr Output (Zero Color Conversion)
For video pipelines or re-encoding:
use zenjpeg::decoder::{Decoder, DecodedYCbCr};
let ycbcr: DecodedYCbCr = Decoder::new().decode_to_ycbcr_f32(&jpeg_data)?;
// Access Y, Cb, Cr planes directly (f32, range [-128, 127])
Reading JPEG Info Without Decoding
let info = Decoder::new().read_info(&jpeg_data)?;
println!("{}x{}, {} components", info.width, info.height, info.num_components);
Decoder Options
| Method | Description | Default |
|---|---|---|
.output_format(fmt) |
Output pixel format | Rgb |
.fancy_upsampling(bool) |
Smooth chroma upsampling | true |
.block_smoothing(bool) |
DCT block edge smoothing | false |
.apply_icc(bool) |
Apply embedded ICC profile | true |
.dequant_bias(bool) |
Laplacian dequantization biases (see below) | false |
.max_pixels(n) |
Pixel count limit (DoS protection) | 100M |
.max_memory(n) |
Memory limit in bytes | 512 MB |
Output Formats
PixelFormat |
Bytes/px | Description |
|---|---|---|
Rgb |
3 | R-G-B (default) |
Bgr |
3 | B-G-R (Windows/GDI) |
Rgba |
4 | R-G-B-A, alpha = 255 |
Bgra |
4 | B-G-R-A, alpha = 255 |
Bgrx |
4 | B-G-R-X, pad = 255 |
Gray |
1 | Grayscale |
All formats work with buffered decode (.decode()), the fast i16 path,
and the streaming scanline reader.
Decoded Image Methods
let image = Decoder::new().decode(&jpeg_data)?;
image.width() // Image width
image.height() // Image height
image.dimensions() // (width, height) tuple
image.pixels() // &[u8] pixel data
image.bytes_per_pixel() // Bytes per pixel for format
image.stride() // Bytes per row
DecoderConfig (Advanced)
use zenjpeg::decoder::{Decoder, DecoderConfig};
// Most users should use the builder methods instead:
let image = Decoder::new()
.fancy_upsampling(true)
.block_smoothing(false)
.apply_icc(true)
.dequant_bias(false)
.max_pixels(100_000_000)
.max_memory(512 * 1024 * 1024)
.decode(&jpeg_data)?;
// Or construct DecodeConfig directly:
let decoder = DecodeConfig::default();
Streaming Decode (Scanline Reader)
Decode row-by-row for minimal memory usage:
use zenjpeg::decoder::Decoder;
use imgref::ImgRefMut;
let mut reader = Decoder::new().scanline_reader(&jpeg_data)?;
let (w, h) = (reader.width() as usize, reader.height() as usize);
let mut buf = vec![0u8; w * h * 4];
let mut rows = 0;
while !reader.is_finished() {
let slice = &mut buf[rows * w * 4..];
let output = ImgRefMut::new(slice, w * 4, h - rows);
rows += reader.read_rows_bgra8(output)?;
}
| Method | Bytes/px | Format |
|---|---|---|
read_rows_rgb8() |
3 | R-G-B |
read_rows_bgr8() |
3 | B-G-R |
read_rows_rgbx8() |
4 | R-G-B-X (pad=255) |
read_rows_rgba8() |
4 | R-G-B-A (A=255) |
read_rows_bgra8() |
4 | B-G-R-A (A=255) |
read_rows_bgrx8() |
4 | B-G-R-X (pad=255) |
read_rows_rgba_f32() |
16 | Linear f32 RGBA |
read_rows_gray8() |
1 | Grayscale u8 |
read_rows_gray_f32() |
4 | Grayscale f32 |
Performance
Encoding Speed
| Image Size | Sequential | Progressive | Notes |
|---|---|---|---|
| 512x512 | 118 MP/s | 58 MP/s | Small images |
| 1024x1024 | 92 MP/s | 36 MP/s | Medium images |
| 2048x2048 | 87 MP/s | 46 MP/s | Large images |
Sequential vs Progressive
| Quality | Seq Size | Prog Size | Prog Δ | Prog Slowdown |
|---|---|---|---|---|
| Q50 | 322 KB | 313 KB | -2.8% | 2.5x |
| Q70 | 429 KB | 416 KB | -3.0% | 2.0x |
| Q85 | 586 KB | 568 KB | -3.1% | 2.1x |
| Q95 | 915 KB | 887 KB | -3.1% | 2.2x |
Progressive produces ~3% smaller files at the same quality, but takes ~2x longer.
Recommendation:
- Use Sequential for: real-time encoding, high throughput
- Use Progressive for: web delivery, storage optimization
Decoding Speed
The default decode path uses fast integer IDCT (matching zune-jpeg performance).
The f32 pipeline is used for XYB images or when dequant_bias(true) is enabled.
| Mode | 2048x2048 | vs zune-jpeg | Notes |
|---|---|---|---|
| Scanline 4:2:0 | 4.03ms | 0.99x | Matches zune-jpeg |
| Scanline 4:4:4 | 5.78ms | 0.91x | Beats zune-jpeg |
| Buffered fast | 4.72ms | 1.15x | Two-pass overhead |
| Buffered default | 5.51ms | 1.35x | f32 upsampling |
Dequantization Bias
Decoder::new().dequant_bias(true) enables optimal Laplacian dequantization
biases (Price & Rabbani 2000). This
computes per-coefficient biases from DCT coefficient statistics and applies them
during f32 dequantization, matching C++ jpegli's decoder behavior.
Tradeoff: Bypasses the fast integer IDCT path. The quality difference vs the default integer IDCT is image-dependent and small in either direction:
| Quality | Default SSIM2 | +bias SSIM2 | C++ jpegli | bias vs default |
|---|---|---|---|---|
| Q50 | 37.28 | 35.95 | 36.01 | -1.32 pts |
| Q85 | 50.45 | 50.18 | 50.21 | -0.27 pts |
| Q95 | 53.28 | 53.25 | 53.27 | -0.03 pts |
(frymire 1118x1105, SSIMULACRA2 vs original, higher = better)
The bias path consistently tracks C++ jpegli output within 0.02-0.11 SSIMULACRA2 points. Use it when you need decode output to match C++ jpegli, or when processing pipelines assume jpegli-style reconstruction.
Table Optimization
The EncodingTables API provides fine-grained control over quantization and zero-bias
tables for researching better encoding parameters.
Quick Start
use zenjpeg::encoder::{EncoderConfig, ChromaSubsampling};
use zenjpeg::encoder::tuning::{EncodingTables, ScalingParams, dct};
// Start from defaults and modify
let mut tables = EncodingTables::default_ycbcr();
// Scale a specific coefficient (component 0 = Y, k = coefficient index)
tables.scale_quant(0, 5, 1.2); // 20% higher quantization at position 5
// Or use exact quantization values (no quality scaling)
tables.scaling = ScalingParams::Exact;
tables.quant.c0[0] = 16.0; // DC quantization for Y
let config = EncoderConfig::ycbcr(85.0, ChromaSubsampling::Quarter)
.tables(Box::new(tables));
Understanding the Parameters
Quantization Tables (quant): 64 coefficients per component (Y/Cb/Cr or X/Y/B)
- Lower values = more precision = larger file
- Higher values = more compression = smaller file
- DC (index 0) affects brightness uniformity
- Low frequencies (indices 1, 8, 9, 16, 17) affect gradients
- High frequencies affect edges and texture
Zero-Bias Tables (zero_bias_mul, zero_bias_offset_*):
- Control rounding behavior during quantization
zero_bias_mul[k]multiplies the dead zone around zero- Higher values = more aggressive zeroing of small coefficients = smaller files
zero_bias_offset_dc/acadd to the threshold before zeroing
Scaling Params:
ScalingParams::Scaled { global_scale, frequency_exponents }- quality-dependent scalingScalingParams::Exact- use raw values (must be valid u16 range)
DCT Coefficient Layout
Position in 8x8 block (row-major index k):
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
k=0 is DC (average brightness)
k=1,8 are lowest AC frequencies (horizontal/vertical gradients)
k=63 is highest frequency (diagonal detail)
Use dct::freq_distance(k) to get Manhattan distance from DC (0-14).
Use dct::IMPORTANCE_ORDER for coefficients sorted by perceptual impact.
Research Methodology
1. Corpus-Based Optimization
use zenjpeg::encoder::tuning::{EncodingTables, dct};
fn evaluate_tables(tables: &EncodingTables, corpus: &[Image]) -> f64 {
let mut total_score = 0.0;
for image in corpus {
let jpeg = encode_with_tables(image, tables);
let score = ssimulacra2_per_byte(&jpeg, image); // quality/size
total_score += score;
}
total_score / corpus.len() as f64
}
// Grid search over coefficient k
fn optimize_coefficient(k: usize, component: usize, corpus: &[Image]) {
let mut best_score = f64::MIN;
let mut best_value = 1.0;
for scale in [0.5, 0.75, 1.0, 1.25, 1.5, 2.0] {
let mut tables = EncodingTables::default_ycbcr();
tables.scale_quant(component, k, scale);
let score = evaluate_tables(&tables, corpus);
if score > best_score {
best_score = score;
best_value = scale;
}
}
println!("Coefficient {} best scale: {}", k, best_value);
}
2. Gradient-Free Optimization
For automated discovery, use derivative-free optimizers:
// Using argmin crate with Nelder-Mead
use argmin::solver::neldermead::NelderMead;
fn objective(params: &[f64], corpus: &[Image]) -> f64 {
let mut tables = EncodingTables::default_ycbcr();
// Map params to table modifications (e.g., first 10 most impactful coefficients)
for (i, &scale) in params.iter().enumerate() {
let k = dct::IMPORTANCE_ORDER[i + 1]; // Skip DC
tables.scale_quant(0, k, scale as f32); // Y component
}
-evaluate_tables(&tables, corpus) // Negative because we minimize
}
Recommended optimizers:
- CMA-ES (Covariance Matrix Adaptation): Best for 10-50 parameters
- Nelder-Mead: Good for quick exploration, 5-20 parameters
- Differential Evolution: Robust, handles constraints well
- Bayesian Optimization: Sample-efficient when evaluations are expensive
3. Image-Adaptive Tables
Different image categories may benefit from different tables:
| Content Type | Strategy |
|---|---|
| Photographs | Lower DC/low-freq quant, preserve gradients |
| Graphics/UI | Higher high-freq quant, preserve edges |
| Text on photos | Balance - preserve both |
| Skin tones | Lower Cb/Cr quant in mid frequencies |
fn classify_and_encode(image: &Image) -> Vec<u8> {
let tables = match classify_content(image) {
ContentType::Photo => tables_optimized_for_photos(),
ContentType::Graphic => tables_optimized_for_graphics(),
ContentType::Mixed => EncodingTables::default_ycbcr(),
};
encode_with_tables(image, &tables)
}
4. Perceptual Weighting
Use quality metrics to weight optimization:
// SSIMULACRA2 weights certain frequencies more than others
// Butteraugli penalizes different artifacts
fn multi_metric_score(jpeg: &[u8], original: &Image) -> f64 {
let ssim2 = ssimulacra2(jpeg, original);
let butteraugli = butteraugli_distance(jpeg, original);
let size = jpeg.len() as f64;
// Combine: higher quality, lower butteraugli, smaller size
(ssim2 * 100.0 - butteraugli * 10.0) / (size / 1000.0)
}
Ideas for Research
- Content-aware table selection: Train a classifier to select optimal tables
- Quality-dependent tables: Different tables for Q50 vs Q90
- Resolution-dependent: High-res images may need different high-freq handling
- Per-block adaptive: Use AQ to modulate per-block quantization
- Machine learning: Use differentiable JPEG approximations to train tables
- Genetic algorithms: Evolve table populations over a corpus
- Transfer learning: Start from optimized tables for similar content
Available Helpers
use zenjpeg::encoder::tuning::dct;
// Coefficient analysis
dct::freq_distance(k) // Manhattan distance from DC (0-14)
dct::row_col(k) // (row, col) in 8x8 block
dct::to_zigzag(k) // Row-major to zigzag order
dct::from_zigzag(z) // Zigzag to row-major
dct::IMPORTANCE_ORDER // Coefficients by perceptual impact
// Table manipulation
tables.scale_quant(c, k, factor) // Scale one coefficient
tables.perturb_quant(c, k, delta) // Add delta to coefficient
tables.blend(&other, t) // Linear interpolation (0.0-1.0)
tables.quant.scale_component(c, f) // Scale entire component
tables.quant.scale_all(f) // Scale all coefficients
Overshoot Deringing
Enabled by default. This technique was pioneered by @kornel in mozjpeg and significantly improves quality for documents, screenshots, and graphics without any quality penalty for photographic content.
The Problem
JPEG uses DCT (Discrete Cosine Transform) which represents pixel blocks as sums of cosine waves. Hard edges—like text on a white background—create high-frequency components that are difficult to represent accurately. The result is "ringing": oscillating artifacts that look like halos or waves emanating from sharp transitions.
The Insight
JPEG decoders clamp output values to 0-255. This means to display white (255), any encoded value ≥255 works identically after clamping. The encoder can exploit this "headroom" above the displayable range.
The Solution
Instead of encoding a flat plateau at the maximum value, deringing creates a smooth curve that "overshoots" above the maximum:
- The peak (above 255) gets clamped to 255 on decode
- The result looks identical to the original
- But the smooth curve compresses much better with fewer artifacts!
This is analogous to "anti-clipping" in audio processing.
When It Helps Most
- Documents and screenshots with white backgrounds
- Text and graphics with hard edges
- Any image with saturated regions (pixels at 0 or 255)
- UI elements with sharp corners
Usage
Deringing is on by default. To disable it (not recommended):
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter)
.deringing(false); // Disable deringing
C++ Parity Status
Tested against C++ jpegli on frymire.png (1118x1105):
| Metric | Rust | C++ | Difference |
|---|---|---|---|
| File size (Q85 seq) | 586.3 KB | 586.7 KB | -0.1% |
| File size (Q85 prog) | 568.2 KB | 565.1 KB | +0.5% |
| SSIM2 (Q85) | 69.0 | 69.0 | identical |
Quality is identical (mean <0.5% difference); file sizes within 2%.
Comparing with C++ jpegli: 2 vs 3 Quantization Tables
When comparing output between zenjpeg and C++ jpegli, use jpegli_set_distance()
in C++, not jpeg_set_quality(). Here's why:
The issue:
jpeg_set_quality()in C++ uses 2 chroma tables (Cb and Cr share the same table)jpegli_set_distance()in C++ uses 3 tables (separate Y, Cb, Cr tables)- zenjpeg always uses 3 tables
Using jpeg_set_quality() for comparison will show ~4% file size differences and
different quantization behavior because the encoders are configured differently.
Correct comparison (FFI):
// C++ - use distance-based quality (3 tables)
jpegli_set_distance(&cinfo, 1.0, JPEGLI_TRUE); // distance 1.0 ≈ quality 90
// NOT: jpeg_set_quality(&cinfo, 90, TRUE); // 2 tables - invalid comparison!
Quality to distance conversion:
fn quality_to_distance(q: f32) -> f32 {
if q >= 100.0 { 0.01 }
else if q >= 30.0 { 0.1 + (100.0 - q) * 0.09 }
else { 53.0 / 3000.0 * q * q - 23.0 / 20.0 * q + 25.0 }
}
// q90 → distance 1.0, q75 → distance 2.35
With proper distance-based comparison, size and quality differences are typically within ±2%.
Matching jpeg_set_quality() behavior:
If you need output that matches tools using jpeg_set_quality() (2 tables),
use the .separate_chroma_tables(false) option:
// Match jpeg_set_quality() behavior (2 tables: Y, shared chroma)
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter)
.separate_chroma_tables(false);
Feature Flags
| Feature | Default | Description | When to Use |
|---|---|---|---|
decoder |
❌ No | JPEG decoding - Enables zenjpeg::decoder module |
Required for any decode operations |
std |
✅ Yes | Standard library support | Disable for no_std embedded targets |
archmage-simd |
✅ Yes | Safe SIMD via archmage (~10-20% faster) | Keep enabled for best performance |
cms-lcms2 |
✅ Yes | ICC color management via lcms2 | XYB decoding, wide-gamut images |
cms-moxcms |
❌ No | Pure Rust color management | no_std or avoid C dependencies |
parallel |
❌ No | Multi-threaded encoding via rayon | Large images (4K+), server workloads |
ultrahdr |
❌ No | UltraHDR HDR gain map support | Encoding/decoding HDR JPEGs |
trellis |
✅ Yes | Trellis quantization (mozjpeg-style) | Keep enabled for best compression |
yuv |
✅ Yes | SharpYUV chroma downsampling | Keep enabled for quality |
By default, the crate uses #![forbid(unsafe_code)]. SIMD is provided via the safe wide crate, with archmage-simd (default) adding token-based intrinsics for ~10-20% speedup.
Common Configurations
# Decode + encode (most common)
[dependencies]
zenjpeg = { version = "0.6", features = ["decoder"] }
# Encode only (default)
[dependencies]
zenjpeg = "0.6"
# High-performance server
[dependencies]
zenjpeg = { version = "0.6", features = ["decoder", "parallel"] }
# Embedded / no_std
[dependencies]
zenjpeg = { version = "0.6", default-features = false, features = ["cms-moxcms"] }
# UltraHDR support
[dependencies]
zenjpeg = { version = "0.6", features = ["decoder", "ultrahdr"] }
Encoder Status
| Feature | Status |
|---|---|
| Baseline JPEG | Working |
| Progressive JPEG | Working |
| Adaptive quantization | Working |
| Huffman optimization | Working |
| 4:4:4 / 4:2:0 / 4:2:2 / 4:4:0 | Working |
| XYB color space | Working |
| Grayscale | Working |
| Custom quant tables | Working |
| ICC profile embedding | Working |
| YCbCr planar input | Working |
Decoder Status
Prerelease: Enable with
features = ["decoder"]. API will have breaking changes.
| Feature | Status |
|---|---|
| Baseline JPEG | Working |
| Progressive JPEG | Working |
| All subsampling modes | Working |
| Restart markers | Working |
| ICC profile extraction | Working |
| XYB decoding | Working (with CMS) |
| f32 output | Working |
Future Optimization Opportunities
Profiling against C++ jpegli reveals these bottlenecks (2K image, progressive 4:2:0):
| Area | Rust | C++ | Gap | Notes |
|---|---|---|---|---|
| RGB→YCbCr | 11.7% | 1.7% | 6.9x | Biggest opportunity |
| Adaptive quantization | 28.6% | 12.1% | 2.4x | Algorithm efficiency |
| Huffman freq counting | 5.7% | 0.5% | 11x | Already SIMD, still slow |
| DCT | 7.3% | 5.5% | 1.3x | Reasonable |
| Entropy encoding | 10.9% | 35.9% | — | C++ slower here |
Crates to investigate for RGB→YCbCr:
yuv(0.8.9) - Faster than libyuv, AVX-512/AVX2/SSE/NEONyuvutils-rs- AVX2/SSE/NEON, optional AVX-512dcv-color-primitives- AWS, AVX2/NEON
Current gap: Rust is ~20% slower than C++ jpegli (1.2x median, range 1.05x-1.43x per criterion benchmarks).
Development
Verify C++ Parity
# Quick parity test (no C++ build needed)
cargo test --release --test cpp_parity_locked
# Full comparison (requires C++ jpegli built)
cargo test --release --test comprehensive_cpp_comparison -- --nocapture --ignored
Building C++ Reference (Optional)
git submodule update --init --recursive
cd internal/jpegli-cpp && mkdir -p build && cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DJPEGXL_ENABLE_TOOLS=ON ..
ninja cjpegli djpegli
License
Sustainable, large-scale open source work requires a funding model, and I have been doing this full-time for 15 years. If you are using this for closed-source development AND make over $1 million per year, you'll need to buy a commercial license at https://www.imazen.io/pricing
Commercial licenses are similar to the Apache 2 license but company-specific, and on a sliding scale. You can also use this under the AGPL v3.
Acknowledgments
Originally a port of jpegli from the JPEG XL project by Google (BSD-3-Clause). After six rewrites, this is now an independent project that shares ideas but little code with the original.
AI Disclosure
Developed with assistance from Claude (Anthropic). Extensively tested against C++ reference with 340+ tests. Report issues at https://github.com/imazen/zenjpeg/issues
Dependencies
~7–12MB
~242K SLoC