#jpeg #jpeg-decoder #jpeg-encoder #image #image-compression #jpeg-codec

no-std zenjpeg

Pure Rust JPEG encoder/decoder with perceptual optimizations

6 releases (breaking)

new 0.6.0 Feb 8, 2026
0.5.0 Feb 4, 2026
0.4.0 Feb 4, 2026
0.3.1 Feb 1, 2026
0.1.0 Dec 27, 2025

#1466 in Images

42 downloads per month

AGPL-3.0-or-later

3MB
59K SLoC

zenjpeg

Crates.io Documentation CI License: AGPL/Commercial

A pure Rust JPEG encoder and decoder with perceptual optimizations.

Important: The decoder requires the decoder feature flag:

[dependencies]
zenjpeg = { version = "0.6", features = ["decoder"] }

See Feature Flags for details.

Note: This crate was previously published as jpegli-rs. If migrating, update your imports from use jpegli:: to use zenjpeg::.

Heritage and Divergence

This project started as a port of jpegli, Google's improved JPEG encoder from the JPEG XL project. After six rewrites it has diverged significantly into an independent project.

Ideas adopted from jpegli:

  • Adaptive quantization (content-aware bit allocation)
  • XYB color space with ICC profiles (progressive mode recommended for best compression)
  • Perceptually-tuned quantization tables
  • Zero-bias strategies for coefficient rounding

Ideas adopted from mozjpeg:

  • Overshoot deringing for documents/graphics
  • Trellis quantization for optimal coefficient selection
  • Hybrid trellis mode (experimental, see Trellis Modes below)

Where we went our own way:

  • Pure Rust, #![forbid(unsafe_code)] unconditionally (SIMD via safe archmage tokens)
  • Streaming encoder API for memory efficiency (process images row-by-row)
  • Portable SIMD via wide crate instead of platform intrinsics
  • Parallel encoding support
  • UltraHDR support (HDR gain maps for backward-compatible HDR JPEGs)
  • Independent optimizations and bug fixes

Features

  • Pure Rust - No C/C++ dependencies, builds anywhere Rust does
  • Perceptual optimization - Adaptive quantization for better visual quality at smaller sizes
  • Trellis quantization - Optimal coefficient selection from mozjpeg
  • Overshoot deringing - Eliminates ringing artifacts on documents and graphics (enabled by default)
  • Backward compatible - Produces standard JPEG files readable by any decoder
  • SIMD accelerated - Portable SIMD via wide crate
  • Streaming API - Memory-efficient row-by-row encoding for large images
  • Parallel encoding - Multi-threaded for large images (1024x1024+)
  • UltraHDR support - Encode/decode HDR gain maps (optional ultrahdr feature)
  • Color management - Optional ICC profile support

Known Limitations

  • XYB color space - With progressive mode, matches or beats C++ jpegli file sizes. Baseline mode is 2-3% larger.
  • XYB decoder speed - XYB images use f32 pipeline; standard JPEG decoding uses fast integer IDCT.

Trellis Modes

zenjpeg supports three quantization modes:

Standard (jpegli-style)

Default mode. Uses adaptive quantization with perceptual zero-bias. Good balance of speed and quality.

let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter);

Standalone Trellis (mozjpeg-style)

Rate-distortion optimized coefficient selection. Typically 10-15% smaller files at equivalent quality. Slightly slower due to dynamic programming optimization.

use zenjpeg::encode::{ExpertConfig, OptimizationPreset, ColorMode, ChromaSubsampling};

let expert = ExpertConfig::from_preset(OptimizationPreset::MozjpegBaseline, 85);
let config = expert.to_encoder_config(ColorMode::YCbCr {
    subsampling: ChromaSubsampling::Quarter,
});

Combines jpegli's adaptive quantization with mozjpeg's trellis. This is our best mode and is enabled via .auto_optimize(true):

  • +1.5 SSIM2 points vs jpegli at matched file size
  • -1.5% to -2% smaller files at matched quality
  • Works across q50-q95 range
use zenjpeg::encoder::{EncoderConfig, ChromaSubsampling};

// Recommended: use auto_optimize for best results
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter)
    .auto_optimize(true);

Quick Start

Encode

use zenjpeg::encoder::{EncoderConfig, PixelLayout, ChromaSubsampling, Unstoppable};

// Best quality/size with auto_optimize
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter)
    .auto_optimize(true);
let mut enc = config.encode_from_bytes(width, height, PixelLayout::Rgb8Srgb)?;
enc.push_packed(&rgb_bytes, Unstoppable)?;
let jpeg_bytes: Vec<u8> = enc.finish()?;

Decode

Requires features = ["decoder"] (prerelease API).

use zenjpeg::decoder::Decoder;
use enough::Unstoppable;

let result = Decoder::new().decode(&jpeg_bytes, Unstoppable)?;
let rgb_pixels: &[u8] = result.pixels_u8().expect("u8 output");
let (width, height) = result.dimensions();

Resource Limits and Cancellation

Resource Limits (DoS Protection)

Protect against malicious images that could exhaust memory or CPU:

use zenjpeg::decoder::Decoder;
use zenjpeg::types::Limits;

// Set limits individually
let decoder = Decoder::new()
    .max_pixels(100_000_000)      // 100 megapixels max
    .max_memory(512_000_000);     // 512 MB max allocation

// Or use Limits struct
let limits = Limits {
    max_pixels: Some(100_000_000),
    max_memory: Some(512_000_000),
    max_output: None,
};
let decoder = Decoder::new().limits(limits);

Default limits:

  • max_pixels: 100 megapixels
  • max_memory: 512 MB

Set to 0 or None for unlimited (not recommended for untrusted input).

Cooperative Cancellation

Use Stop tokens for graceful shutdown in long-running operations:

use enough::{Stop, Unstoppable};
use std::sync::Arc;
use std::sync::atomic::{AtomicBool, Ordering};

// Simple case: never cancel
let image = Decoder::new().decode(&jpeg_data, Unstoppable)?;

// Custom stop token (e.g., user clicked cancel button)
struct CancelToken {
    cancelled: Arc<AtomicBool>,
}

impl Stop for CancelToken {
    fn should_stop(&self) -> bool {
        self.cancelled.load(Ordering::Relaxed)
    }
}

let cancel = CancelToken {
    cancelled: Arc::new(AtomicBool::new(false)),
};

// Decode with cancellation support
let result = Decoder::new().decode(&jpeg_data, &cancel);

// In another thread: cancel.cancelled.store(true, Ordering::Relaxed);

Encoder cancellation:

let mut encoder = config.encode_from_bytes(width, height, layout)?;
encoder.push_packed(&pixels, &cancel_token)?;  // Can be cancelled during push
let jpeg = encoder.finish()?;

Per-Image Metadata (Three-Layer Pattern)

For encoding multiple images with the same config but different metadata:

use zenjpeg::encoder::{EncoderConfig, ChromaSubsampling, Exif, Orientation};

// Layer 1: Reusable config (quality, color mode, optimization settings)
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter)
    .auto_optimize(true)
    .progressive(true);

// Layer 2: Per-image request (metadata, limits, stop token)
// Image 1: sRGB with orientation
let jpeg1 = config.request()
    .icc_profile(&srgb_icc_bytes)
    .exif(Exif::build().orientation(Orientation::Rotate90))
    .encode(&pixels1, 1920, 1080)?;

// Image 2: Display P3 with different metadata
let jpeg2 = config.request()
    .icc_profile(&p3_icc_bytes)
    .exif(Exif::build().copyright("© 2024 Example Corp"))
    .encode(&pixels2, 3840, 2160)?;

// Image 3: No metadata, with cancellation
let jpeg3 = config.request()
    .stop(&cancel_token)
    .encode(&pixels3, 800, 600)?;

Why three layers?

  1. EncoderConfig - Reusable settings (quality, color mode, progressive)
  2. EncodeRequest - Per-image data (ICC profile, EXIF, XMP, limits, stop token)
  3. Encoder - Streaming execution (push rows, finish)

Request builder methods:

  • .icc_profile(&[u8]) - Borrowed ICC profile
  • .icc_profile_owned(Vec<u8>) - Owned ICC profile
  • .exif(Exif) - EXIF metadata
  • .xmp(&[u8]) / .xmp_owned(Vec<u8>) - XMP metadata
  • .stop(&dyn Stop) - Cancellation token
  • .limits(Limits) - Resource limits (encoder future feature)

Streaming with request:

let mut encoder = config.request()
    .icc_profile(&srgb_bytes)
    .encode_from_rgb::<rgb::RGB<u8>>(1920, 1080)?;

encoder.push_packed(&pixels, Unstoppable)?;
let jpeg = encoder.finish()?;

API Reference

Encoder API

All encoder types are in zenjpeg::encoder:

use zenjpeg::encoder::{
    EncoderConfig, PixelLayout, Quality, ChromaSubsampling, Unstoppable
};

Quick Start

use zenjpeg::encoder::{EncoderConfig, PixelLayout, ChromaSubsampling, Unstoppable};

// Create reusable config (quality and color mode set in constructor)
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter)
    .progressive(true);

// Encode from raw bytes
let mut enc = config.encode_from_bytes(1920, 1080, PixelLayout::Rgb8Srgb)?;
enc.push_packed(&rgb_bytes, Unstoppable)?;
let jpeg = enc.finish()?;

Three Encoder Entry Points

Method Input Type Use Case
encode_from_bytes(w, h, layout) &[u8] Raw byte buffers
encode_from_rgb::<P>(w, h) rgb crate types RGB<u8>, RGBA<f32>, etc.
encode_from_ycbcr_planar(w, h) YCbCrPlanes Video decoder output

Examples

use zenjpeg::encoder::{EncoderConfig, PixelLayout, ChromaSubsampling, Unstoppable};

let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter);

// From raw RGB bytes
let mut enc = config.encode_from_bytes(800, 600, PixelLayout::Rgb8Srgb)?;
enc.push_packed(&rgb_bytes, Unstoppable)?;
let jpeg = enc.finish()?;

// From rgb crate types
use rgb::RGB;
let mut enc = config.encode_from_rgb::<RGB<u8>>(800, 600)?;
enc.push_packed(&pixels, Unstoppable)?;
let jpeg = enc.finish()?;

// From planar YCbCr (video pipelines)
let mut enc = config.encode_from_ycbcr_planar(1920, 1080)?;
enc.push(&planes, num_rows, Unstoppable)?;
let jpeg = enc.finish()?;

EncoderConfig Constructors

Choose one constructor based on desired color mode:

Constructor Color Mode Use Case
EncoderConfig::ycbcr(q, sub) YCbCr Standard JPEG (most compatible)
EncoderConfig::xyb(q, b_sub) XYB Perceptual color space (better quality)
EncoderConfig::grayscale(q) Grayscale Single-channel output

Builder Methods

Method Description Default
.auto_optimize(bool) Best quality/size - enables hybrid trellis λ=14.5 false
.progressive(bool) Progressive JPEG (3-7% smaller) true
.huffman(impl Into<HuffmanStrategy>) Huffman table strategy Optimize
.deringing(bool) Overshoot deringing for documents/graphics true
.sharp_yuv(bool) SharpYUV downsampling false
.separate_chroma_tables(bool) Use 3 quant tables (Y, Cb, Cr) vs 2 (Y, shared) true
.icc_profile(bytes) Attach ICC profile None
.exif(exif) Embed EXIF metadata None
.xmp(data) Embed XMP metadata None
.restart_interval(n) MCUs between restart markers 0

Quality Options

use zenjpeg::encoder::{EncoderConfig, Quality, ChromaSubsampling};

// Simple quality scale (0-100)
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter);

// Quality enum variants
let config = EncoderConfig::ycbcr(
    Quality::ApproxJpegli(85.0),  // Default scale
    ChromaSubsampling::Quarter
);
// Or: Quality::ApproxMozjpeg(80)      - Match mozjpeg output
// Or: Quality::ApproxSsim2(90.0)      - Target SSIMULACRA2 score
// Or: Quality::ApproxButteraugli(1.0) - Target butteraugli distance

Pixel Layouts

Layout Bytes/px Notes
Rgb8Srgb 3 Default, sRGB gamma
Bgr8Srgb 3 Windows/GDI order
Rgba8Srgb / Rgbx8Srgb 4 Alpha/pad ignored
Bgra8Srgb / Bgrx8Srgb 4 BGR + alpha/pad ignored
Gray8Srgb 1 Grayscale sRGB
Rgb16Linear / Rgba16Linear 6/8 16-bit linear
RgbF32Linear / RgbaF32Linear 12/16 HDR float (0.0-1.0)
YCbCr8 / YCbCrF32 3/12 Pre-converted YCbCr

Chroma Subsampling

use zenjpeg::encoder::{EncoderConfig, ChromaSubsampling, XybSubsampling};

// YCbCr subsampling
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter);  // 4:2:0 (best compression)
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::None);     // 4:4:4 (best quality)
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::HalfHorizontal); // 4:2:2
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::HalfVertical);   // 4:4:0

// XYB B-channel subsampling
let config = EncoderConfig::xyb(85, XybSubsampling::BQuarter); // B at 4:2:0
let config = EncoderConfig::xyb(85, XybSubsampling::Full);    // No subsampling

Resource Estimation

use zenjpeg::encoder::{EncoderConfig, ChromaSubsampling};

let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter);

// Typical memory estimate
let estimate = config.estimate_memory(1920, 1080);

// Guaranteed upper bound (for resource reservation)
let ceiling = config.estimate_memory_ceiling(1920, 1080);

Decoder API

Prerelease: The decoder API is behind the decoder feature flag and will have breaking changes. Enable with zenjpeg = { version = "...", features = ["decoder"] }.

All decoder types are in zenjpeg::decoder:

use zenjpeg::decoder::{Decoder, DecodeResult};

Basic Decoding

use zenjpeg::decoder::Decoder;
use enough::Unstoppable;

// Decode to u8 RGB (default)
let result = Decoder::new().decode(&jpeg_data, Unstoppable)?;
let pixels: &[u8] = result.pixels_u8().expect("u8 output");
let (width, height) = result.dimensions();

High-Precision Decoding (f32)

Use OutputTarget for f32 output with different transfer functions:

use zenjpeg::decoder::{Decoder, OutputTarget};
use enough::Unstoppable;

// sRGB gamma-encoded f32 (0.0-1.0 range)
let result = Decoder::new()
    .output_target(OutputTarget::SrgbF32)
    .decode(&jpeg_data, Unstoppable)?;
let pixels: &[f32] = result.pixels_f32().expect("f32 output");

// Linear light f32 (for compositing, HDR)
let result = Decoder::new()
    .output_target(OutputTarget::LinearF32)
    .decode(&jpeg_data, Unstoppable)?;

// Convert f32 to u8 or u16 when needed
let u8_pixels: Option<Vec<u8>> = result.to_u8();
let u16_pixels: Option<Vec<u16>> = result.to_u16();

YCbCr Output (Zero Color Conversion)

For video pipelines or re-encoding:

use zenjpeg::decoder::{Decoder, DecodedYCbCr};

let ycbcr: DecodedYCbCr = Decoder::new().decode_to_ycbcr_f32(&jpeg_data)?;
// Access Y, Cb, Cr planes directly (f32, range [-128, 127])

Reading JPEG Info Without Decoding

let info = Decoder::new().read_info(&jpeg_data)?;
println!("{}x{}, {} components", info.width, info.height, info.num_components);

Decoder Options

Method Description Default
.output_format(fmt) Output pixel format Rgb
.fancy_upsampling(bool) Smooth chroma upsampling true
.block_smoothing(bool) DCT block edge smoothing false
.apply_icc(bool) Apply embedded ICC profile true
.dequant_bias(bool) Laplacian dequantization biases (see below) false
.max_pixels(n) Pixel count limit (DoS protection) 100M
.max_memory(n) Memory limit in bytes 512 MB

Output Formats

PixelFormat Bytes/px Description
Rgb 3 R-G-B (default)
Bgr 3 B-G-R (Windows/GDI)
Rgba 4 R-G-B-A, alpha = 255
Bgra 4 B-G-R-A, alpha = 255
Bgrx 4 B-G-R-X, pad = 255
Gray 1 Grayscale

All formats work with buffered decode (.decode()), the fast i16 path, and the streaming scanline reader.

Decoded Image Methods

let image = Decoder::new().decode(&jpeg_data)?;

image.width()           // Image width
image.height()          // Image height
image.dimensions()      // (width, height) tuple
image.pixels()          // &[u8] pixel data
image.bytes_per_pixel() // Bytes per pixel for format
image.stride()          // Bytes per row

DecoderConfig (Advanced)

use zenjpeg::decoder::{Decoder, DecoderConfig};

// Most users should use the builder methods instead:
let image = Decoder::new()
    .fancy_upsampling(true)
    .block_smoothing(false)
    .apply_icc(true)
    .dequant_bias(false)
    .max_pixels(100_000_000)
    .max_memory(512 * 1024 * 1024)
    .decode(&jpeg_data)?;

// Or construct DecodeConfig directly:
let decoder = DecodeConfig::default();

Streaming Decode (Scanline Reader)

Decode row-by-row for minimal memory usage:

use zenjpeg::decoder::Decoder;
use imgref::ImgRefMut;

let mut reader = Decoder::new().scanline_reader(&jpeg_data)?;
let (w, h) = (reader.width() as usize, reader.height() as usize);
let mut buf = vec![0u8; w * h * 4];

let mut rows = 0;
while !reader.is_finished() {
    let slice = &mut buf[rows * w * 4..];
    let output = ImgRefMut::new(slice, w * 4, h - rows);
    rows += reader.read_rows_bgra8(output)?;
}
Method Bytes/px Format
read_rows_rgb8() 3 R-G-B
read_rows_bgr8() 3 B-G-R
read_rows_rgbx8() 4 R-G-B-X (pad=255)
read_rows_rgba8() 4 R-G-B-A (A=255)
read_rows_bgra8() 4 B-G-R-A (A=255)
read_rows_bgrx8() 4 B-G-R-X (pad=255)
read_rows_rgba_f32() 16 Linear f32 RGBA
read_rows_gray8() 1 Grayscale u8
read_rows_gray_f32() 4 Grayscale f32

Performance

Encoding Speed

Image Size Sequential Progressive Notes
512x512 118 MP/s 58 MP/s Small images
1024x1024 92 MP/s 36 MP/s Medium images
2048x2048 87 MP/s 46 MP/s Large images

Sequential vs Progressive

Quality Seq Size Prog Size Prog Δ Prog Slowdown
Q50 322 KB 313 KB -2.8% 2.5x
Q70 429 KB 416 KB -3.0% 2.0x
Q85 586 KB 568 KB -3.1% 2.1x
Q95 915 KB 887 KB -3.1% 2.2x

Progressive produces ~3% smaller files at the same quality, but takes ~2x longer.

Recommendation:

  • Use Sequential for: real-time encoding, high throughput
  • Use Progressive for: web delivery, storage optimization

Decoding Speed

The default decode path uses fast integer IDCT (matching zune-jpeg performance). The f32 pipeline is used for XYB images or when dequant_bias(true) is enabled.

Mode 2048x2048 vs zune-jpeg Notes
Scanline 4:2:0 4.03ms 0.99x Matches zune-jpeg
Scanline 4:4:4 5.78ms 0.91x Beats zune-jpeg
Buffered fast 4.72ms 1.15x Two-pass overhead
Buffered default 5.51ms 1.35x f32 upsampling

Dequantization Bias

Decoder::new().dequant_bias(true) enables optimal Laplacian dequantization biases (Price & Rabbani 2000). This computes per-coefficient biases from DCT coefficient statistics and applies them during f32 dequantization, matching C++ jpegli's decoder behavior.

Tradeoff: Bypasses the fast integer IDCT path. The quality difference vs the default integer IDCT is image-dependent and small in either direction:

Quality Default SSIM2 +bias SSIM2 C++ jpegli bias vs default
Q50 37.28 35.95 36.01 -1.32 pts
Q85 50.45 50.18 50.21 -0.27 pts
Q95 53.28 53.25 53.27 -0.03 pts

(frymire 1118x1105, SSIMULACRA2 vs original, higher = better)

The bias path consistently tracks C++ jpegli output within 0.02-0.11 SSIMULACRA2 points. Use it when you need decode output to match C++ jpegli, or when processing pipelines assume jpegli-style reconstruction.

Table Optimization

The EncodingTables API provides fine-grained control over quantization and zero-bias tables for researching better encoding parameters.

Quick Start

use zenjpeg::encoder::{EncoderConfig, ChromaSubsampling};
use zenjpeg::encoder::tuning::{EncodingTables, ScalingParams, dct};

// Start from defaults and modify
let mut tables = EncodingTables::default_ycbcr();

// Scale a specific coefficient (component 0 = Y, k = coefficient index)
tables.scale_quant(0, 5, 1.2);  // 20% higher quantization at position 5

// Or use exact quantization values (no quality scaling)
tables.scaling = ScalingParams::Exact;
tables.quant.c0[0] = 16.0;  // DC quantization for Y

let config = EncoderConfig::ycbcr(85.0, ChromaSubsampling::Quarter)
    .tables(Box::new(tables));

Understanding the Parameters

Quantization Tables (quant): 64 coefficients per component (Y/Cb/Cr or X/Y/B)

  • Lower values = more precision = larger file
  • Higher values = more compression = smaller file
  • DC (index 0) affects brightness uniformity
  • Low frequencies (indices 1, 8, 9, 16, 17) affect gradients
  • High frequencies affect edges and texture

Zero-Bias Tables (zero_bias_mul, zero_bias_offset_*):

  • Control rounding behavior during quantization
  • zero_bias_mul[k] multiplies the dead zone around zero
  • Higher values = more aggressive zeroing of small coefficients = smaller files
  • zero_bias_offset_dc/ac add to the threshold before zeroing

Scaling Params:

  • ScalingParams::Scaled { global_scale, frequency_exponents } - quality-dependent scaling
  • ScalingParams::Exact - use raw values (must be valid u16 range)

DCT Coefficient Layout

Position in 8x8 block (row-major index k):
 0  1  2  3  4  5  6  7
 8  9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63

k=0 is DC (average brightness)
k=1,8 are lowest AC frequencies (horizontal/vertical gradients)
k=63 is highest frequency (diagonal detail)

Use dct::freq_distance(k) to get Manhattan distance from DC (0-14). Use dct::IMPORTANCE_ORDER for coefficients sorted by perceptual impact.

Research Methodology

1. Corpus-Based Optimization

use zenjpeg::encoder::tuning::{EncodingTables, dct};

fn evaluate_tables(tables: &EncodingTables, corpus: &[Image]) -> f64 {
    let mut total_score = 0.0;
    for image in corpus {
        let jpeg = encode_with_tables(image, tables);
        let score = ssimulacra2_per_byte(&jpeg, image);  // quality/size
        total_score += score;
    }
    total_score / corpus.len() as f64
}

// Grid search over coefficient k
fn optimize_coefficient(k: usize, component: usize, corpus: &[Image]) {
    let mut best_score = f64::MIN;
    let mut best_value = 1.0;

    for scale in [0.5, 0.75, 1.0, 1.25, 1.5, 2.0] {
        let mut tables = EncodingTables::default_ycbcr();
        tables.scale_quant(component, k, scale);

        let score = evaluate_tables(&tables, corpus);
        if score > best_score {
            best_score = score;
            best_value = scale;
        }
    }
    println!("Coefficient {} best scale: {}", k, best_value);
}

2. Gradient-Free Optimization

For automated discovery, use derivative-free optimizers:

// Using argmin crate with Nelder-Mead
use argmin::solver::neldermead::NelderMead;

fn objective(params: &[f64], corpus: &[Image]) -> f64 {
    let mut tables = EncodingTables::default_ycbcr();

    // Map params to table modifications (e.g., first 10 most impactful coefficients)
    for (i, &scale) in params.iter().enumerate() {
        let k = dct::IMPORTANCE_ORDER[i + 1]; // Skip DC
        tables.scale_quant(0, k, scale as f32); // Y component
    }

    -evaluate_tables(&tables, corpus) // Negative because we minimize
}

Recommended optimizers:

  • CMA-ES (Covariance Matrix Adaptation): Best for 10-50 parameters
  • Nelder-Mead: Good for quick exploration, 5-20 parameters
  • Differential Evolution: Robust, handles constraints well
  • Bayesian Optimization: Sample-efficient when evaluations are expensive

3. Image-Adaptive Tables

Different image categories may benefit from different tables:

Content Type Strategy
Photographs Lower DC/low-freq quant, preserve gradients
Graphics/UI Higher high-freq quant, preserve edges
Text on photos Balance - preserve both
Skin tones Lower Cb/Cr quant in mid frequencies
fn classify_and_encode(image: &Image) -> Vec<u8> {
    let tables = match classify_content(image) {
        ContentType::Photo => tables_optimized_for_photos(),
        ContentType::Graphic => tables_optimized_for_graphics(),
        ContentType::Mixed => EncodingTables::default_ycbcr(),
    };
    encode_with_tables(image, &tables)
}

4. Perceptual Weighting

Use quality metrics to weight optimization:

// SSIMULACRA2 weights certain frequencies more than others
// Butteraugli penalizes different artifacts

fn multi_metric_score(jpeg: &[u8], original: &Image) -> f64 {
    let ssim2 = ssimulacra2(jpeg, original);
    let butteraugli = butteraugli_distance(jpeg, original);
    let size = jpeg.len() as f64;

    // Combine: higher quality, lower butteraugli, smaller size
    (ssim2 * 100.0 - butteraugli * 10.0) / (size / 1000.0)
}

Ideas for Research

  1. Content-aware table selection: Train a classifier to select optimal tables
  2. Quality-dependent tables: Different tables for Q50 vs Q90
  3. Resolution-dependent: High-res images may need different high-freq handling
  4. Per-block adaptive: Use AQ to modulate per-block quantization
  5. Machine learning: Use differentiable JPEG approximations to train tables
  6. Genetic algorithms: Evolve table populations over a corpus
  7. Transfer learning: Start from optimized tables for similar content

Available Helpers

use zenjpeg::encoder::tuning::dct;

// Coefficient analysis
dct::freq_distance(k)       // Manhattan distance from DC (0-14)
dct::row_col(k)             // (row, col) in 8x8 block
dct::to_zigzag(k)           // Row-major to zigzag order
dct::from_zigzag(z)         // Zigzag to row-major
dct::IMPORTANCE_ORDER       // Coefficients by perceptual impact

// Table manipulation
tables.scale_quant(c, k, factor)    // Scale one coefficient
tables.perturb_quant(c, k, delta)   // Add delta to coefficient
tables.blend(&other, t)              // Linear interpolation (0.0-1.0)
tables.quant.scale_component(c, f)   // Scale entire component
tables.quant.scale_all(f)            // Scale all coefficients

Overshoot Deringing

Enabled by default. This technique was pioneered by @kornel in mozjpeg and significantly improves quality for documents, screenshots, and graphics without any quality penalty for photographic content.

The Problem

JPEG uses DCT (Discrete Cosine Transform) which represents pixel blocks as sums of cosine waves. Hard edges—like text on a white background—create high-frequency components that are difficult to represent accurately. The result is "ringing": oscillating artifacts that look like halos or waves emanating from sharp transitions.

The Insight

JPEG decoders clamp output values to 0-255. This means to display white (255), any encoded value ≥255 works identically after clamping. The encoder can exploit this "headroom" above the displayable range.

The Solution

Instead of encoding a flat plateau at the maximum value, deringing creates a smooth curve that "overshoots" above the maximum:

  • The peak (above 255) gets clamped to 255 on decode
  • The result looks identical to the original
  • But the smooth curve compresses much better with fewer artifacts!

This is analogous to "anti-clipping" in audio processing.

When It Helps Most

  • Documents and screenshots with white backgrounds
  • Text and graphics with hard edges
  • Any image with saturated regions (pixels at 0 or 255)
  • UI elements with sharp corners

Usage

Deringing is on by default. To disable it (not recommended):

let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter)
    .deringing(false);  // Disable deringing

C++ Parity Status

Tested against C++ jpegli on frymire.png (1118x1105):

Metric Rust C++ Difference
File size (Q85 seq) 586.3 KB 586.7 KB -0.1%
File size (Q85 prog) 568.2 KB 565.1 KB +0.5%
SSIM2 (Q85) 69.0 69.0 identical

Quality is identical (mean <0.5% difference); file sizes within 2%.

Comparing with C++ jpegli: 2 vs 3 Quantization Tables

When comparing output between zenjpeg and C++ jpegli, use jpegli_set_distance() in C++, not jpeg_set_quality(). Here's why:

The issue:

  • jpeg_set_quality() in C++ uses 2 chroma tables (Cb and Cr share the same table)
  • jpegli_set_distance() in C++ uses 3 tables (separate Y, Cb, Cr tables)
  • zenjpeg always uses 3 tables

Using jpeg_set_quality() for comparison will show ~4% file size differences and different quantization behavior because the encoders are configured differently.

Correct comparison (FFI):

// C++ - use distance-based quality (3 tables)
jpegli_set_distance(&cinfo, 1.0, JPEGLI_TRUE);  // distance 1.0 ≈ quality 90

// NOT: jpeg_set_quality(&cinfo, 90, TRUE);  // 2 tables - invalid comparison!

Quality to distance conversion:

fn quality_to_distance(q: f32) -> f32 {
    if q >= 100.0 { 0.01 }
    else if q >= 30.0 { 0.1 + (100.0 - q) * 0.09 }
    else { 53.0 / 3000.0 * q * q - 23.0 / 20.0 * q + 25.0 }
}
// q90 → distance 1.0, q75 → distance 2.35

With proper distance-based comparison, size and quality differences are typically within ±2%.

Matching jpeg_set_quality() behavior:

If you need output that matches tools using jpeg_set_quality() (2 tables), use the .separate_chroma_tables(false) option:

// Match jpeg_set_quality() behavior (2 tables: Y, shared chroma)
let config = EncoderConfig::ycbcr(85, ChromaSubsampling::Quarter)
    .separate_chroma_tables(false);

Feature Flags

Feature Default Description When to Use
decoder ❌ No JPEG decoding - Enables zenjpeg::decoder module Required for any decode operations
std ✅ Yes Standard library support Disable for no_std embedded targets
archmage-simd ✅ Yes Safe SIMD via archmage (~10-20% faster) Keep enabled for best performance
cms-lcms2 ✅ Yes ICC color management via lcms2 XYB decoding, wide-gamut images
cms-moxcms ❌ No Pure Rust color management no_std or avoid C dependencies
parallel ❌ No Multi-threaded encoding via rayon Large images (4K+), server workloads
ultrahdr ❌ No UltraHDR HDR gain map support Encoding/decoding HDR JPEGs
trellis ✅ Yes Trellis quantization (mozjpeg-style) Keep enabled for best compression
yuv ✅ Yes SharpYUV chroma downsampling Keep enabled for quality

By default, the crate uses #![forbid(unsafe_code)]. SIMD is provided via the safe wide crate, with archmage-simd (default) adding token-based intrinsics for ~10-20% speedup.

Common Configurations

# Decode + encode (most common)
[dependencies]
zenjpeg = { version = "0.6", features = ["decoder"] }

# Encode only (default)
[dependencies]
zenjpeg = "0.6"

# High-performance server
[dependencies]
zenjpeg = { version = "0.6", features = ["decoder", "parallel"] }

# Embedded / no_std
[dependencies]
zenjpeg = { version = "0.6", default-features = false, features = ["cms-moxcms"] }

# UltraHDR support
[dependencies]
zenjpeg = { version = "0.6", features = ["decoder", "ultrahdr"] }

Encoder Status

Feature Status
Baseline JPEG Working
Progressive JPEG Working
Adaptive quantization Working
Huffman optimization Working
4:4:4 / 4:2:0 / 4:2:2 / 4:4:0 Working
XYB color space Working
Grayscale Working
Custom quant tables Working
ICC profile embedding Working
YCbCr planar input Working

Decoder Status

Prerelease: Enable with features = ["decoder"]. API will have breaking changes.

Feature Status
Baseline JPEG Working
Progressive JPEG Working
All subsampling modes Working
Restart markers Working
ICC profile extraction Working
XYB decoding Working (with CMS)
f32 output Working

Future Optimization Opportunities

Profiling against C++ jpegli reveals these bottlenecks (2K image, progressive 4:2:0):

Area Rust C++ Gap Notes
RGB→YCbCr 11.7% 1.7% 6.9x Biggest opportunity
Adaptive quantization 28.6% 12.1% 2.4x Algorithm efficiency
Huffman freq counting 5.7% 0.5% 11x Already SIMD, still slow
DCT 7.3% 5.5% 1.3x Reasonable
Entropy encoding 10.9% 35.9% C++ slower here

Crates to investigate for RGB→YCbCr:

Current gap: Rust is ~20% slower than C++ jpegli (1.2x median, range 1.05x-1.43x per criterion benchmarks).

Development

Verify C++ Parity

# Quick parity test (no C++ build needed)
cargo test --release --test cpp_parity_locked

# Full comparison (requires C++ jpegli built)
cargo test --release --test comprehensive_cpp_comparison -- --nocapture --ignored

Building C++ Reference (Optional)

git submodule update --init --recursive
cd internal/jpegli-cpp && mkdir -p build && cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DJPEGXL_ENABLE_TOOLS=ON ..
ninja cjpegli djpegli

License

Sustainable, large-scale open source work requires a funding model, and I have been doing this full-time for 15 years. If you are using this for closed-source development AND make over $1 million per year, you'll need to buy a commercial license at https://www.imazen.io/pricing

Commercial licenses are similar to the Apache 2 license but company-specific, and on a sliding scale. You can also use this under the AGPL v3.

Acknowledgments

Originally a port of jpegli from the JPEG XL project by Google (BSD-3-Clause). After six rewrites, this is now an independent project that shares ideas but little code with the original.

AI Disclosure

Developed with assistance from Claude (Anthropic). Extensively tested against C++ reference with 340+ tests. Report issues at https://github.com/imazen/zenjpeg/issues

Dependencies

~7–12MB
~242K SLoC