#zstd #pure-rust

no-std haagenti-zstd

Zstandard-inspired compression for Haagenti (internal format, not Zstd-compatible)

2 releases

0.1.1 Jan 27, 2026
0.1.0 Jan 27, 2026

#97 in Compression

Download history 7/week @ 2026-02-05 38/week @ 2026-02-19 4/week @ 2026-02-26 22/week @ 2026-03-05

64 downloads per month
Used in 4 crates

MIT/Apache

1MB
18K SLoC

haagenti-zstd

Zstandard-inspired compression for the Haagenti compression framework.

Overview

haagenti-zstd provides a pure Rust implementation inspired by Zstandard, optimized for Haagenti's tensor compression pipeline:

  • Pure Rust: No C dependencies, fully native implementation
  • Internal roundtrip: Compression and decompression within Haagenti
  • Optimized for tensors: Modified format tuned for neural network weight patterns
  • Competitive ratios: Achieves compression ratios close to the reference C implementation

Note: This implementation is not compatible with the reference Zstd format. Data compressed with haagenti-zstd can only be decompressed with haagenti-zstd. The format diverges from RFC 8878 to enable optimizations specific to Haagenti's use cases.

Features

[dependencies]
haagenti-zstd = { version = "0.1", features = ["default"] }
Feature Description
std Standard library support (default)
simd SIMD acceleration via haagenti-simd
dictionary Pre-trained dictionary support
multithread Multi-threaded compression

Quick Start

use haagenti_zstd::{ZstdCodec, ZstdCompressor, ZstdDecompressor};
use haagenti_core::{Compressor, Decompressor, CompressionLevel};

// Using the codec (compression + decompression)
let codec = ZstdCodec::new();
let compressed = codec.compress(b"Hello, World!")?;
let original = codec.decompress(&compressed)?;

// Or use compressor/decompressor separately
let compressor = ZstdCompressor::with_level(CompressionLevel::Best);
let compressed = compressor.compress(&data)?;

let decompressor = ZstdDecompressor::new();
let decompressed = decompressor.decompress(&compressed)?;

Architecture

haagenti-zstd/
├── src/
│   ├── lib.rs              # Main codec, compressor, decompressor
│   ├── compress/           # Compression pipeline
│   │   ├── mod.rs          # CompressContext, frame encoding
│   │   ├── analysis.rs     # Compressibility fingerprinting
│   │   ├── match_finder.rs # LZ77 hash chain matching
│   │   ├── block.rs        # Block-level encoding
│   │   └── sequences.rs    # Sequence encoding (RLE, FSE)
│   ├── decompress.rs       # Decompression pipeline
│   ├── huffman/            # Huffman coding
│   │   ├── mod.rs          # Constants and re-exports
│   │   ├── encoder.rs      # Huffman table building and encoding
│   │   ├── decoder.rs      # Huffman decoding
│   │   └── table.rs        # Huffman table structures
│   ├── fse/                # Finite State Entropy coding
│   │   ├── mod.rs          # FSE constants and tables
│   │   ├── encoder.rs      # FSE encoding
│   │   ├── decoder.rs      # FSE decoding
│   │   └── table.rs        # FSE table structures
│   ├── frame/              # Frame format
│   │   ├── mod.rs          # Frame constants
│   │   ├── header.rs       # Frame header parsing
│   │   ├── block.rs        # Block header parsing
│   │   └── checksum.rs     # XXHash64 implementation
│   └── block/              # Block-level structures
│       ├── mod.rs          # Block types
│       ├── literals.rs     # Literals section
│       └── sequences.rs    # Sequences section
└── benches/
    └── zstd_benchmark.rs   # Comprehensive benchmarks

Compression Pipeline

1. Compressibility Fingerprinting

Unlike traditional compressors that blindly attempt compression, haagenti-zstd uses compressibility fingerprinting to predict the optimal encoding strategy before compression:

pub struct CompressibilityFingerprint {
    pub entropy: f32,           // 0.0 = uniform, 8.0 = random
    pub pattern: PatternType,   // Uniform, Periodic, TextLike, etc.
    pub estimated_ratio: f32,   // Predicted compression ratio
    pub strategy: CompressionStrategy,
}

Pattern Types:

  • Uniform - Single byte repeated (perfect RLE candidate)
  • Periodic - Repeating pattern detected (e.g., "ABCABC")
  • LowEntropy - Few unique values
  • TextLike - ASCII text with common characters
  • HighEntropy - Difficult to compress
  • Random - Incompressible (encrypted/random data)

Strategy Selection:

  • RleBlock - Use block-level RLE for uniform data
  • RleSequences - Use RLE sequence mode for uniform matches
  • PredefinedFse - Use FSE with predefined tables
  • RawBlock - Skip compression, store raw

2. Match Finding

LZ77-style match finding using hash chains:

pub struct MatchFinder {
    hash_table: Vec<u32>,    // 4-byte hash -> position
    hash_chain: Vec<u32>,    // Position -> previous position with same hash
    search_depth: usize,     // Chain depth (controlled by compression level)
}

Compression Levels:

Level Search Depth Use Case
None 1 Fastest, minimal compression
Fast 4 Quick compression
Default 16 Balanced
Best 64 Maximum compression
Ultra 128 Extreme compression

3. Literals Encoding

Literals (unmatched bytes) are compressed using Huffman coding:

Single-Stream (≤1023 bytes):

  • Direct Huffman encoding
  • Sentinel bit marks stream end
  • Read backwards from sentinel

4-Stream (1024-262143 bytes):

  • Split literals into 4 parallel streams
  • Better CPU cache utilization
  • Size_Format=1 (14-bit) or Size_Format=2 (18-bit) headers

4. Sequence Encoding

Matches are encoded as sequences: (literals_length, match_length, offset).

Encoding Modes:

  • RLE Mode - For uniform code values (all sequences have same LL/ML/OF code)
  • FSE Mode - For varied sequences using predefined Zstd tables
  • Raw Mode - Fallback when sequences don't compress

FSE Predefined Tables:

  • Literal Length: 36 symbols, 6-bit accuracy
  • Match Length: 53 symbols, 6-bit accuracy
  • Offset: 32 symbols, 5-bit accuracy

Huffman Encoding Details

Weight Calculation

Huffman weights determine code lengths. In Zstd:

  • Weight w means code_length = max_bits + 1 - w
  • Higher weight = shorter code = more frequent symbol
// Weight assignment based on frequency
let log_freq = 32 - freq.leading_zeros();
let weight = (log_freq as u8).min(MAX_WEIGHT).max(1);

Kraft Inequality Normalization

Valid Huffman codes must satisfy the Kraft inequality:

sum(2^weight) = 2^max_weight

The normalize_weights function iteratively adjusts weights:

  1. If sum > target: reduce largest weights
  2. If sum < target: increase smallest weights
  3. Continue until sum equals target exactly

Key Insight: The iteration limit was increased from 100 to 1000 to handle large data (100KB+) where many symbols have similar high frequencies.

Canonical Code Generation

Canonical Huffman codes ensure deterministic encoding:

  1. Count symbols at each code length
  2. Calculate starting codes for each length
  3. Assign codes in symbol order
// Code length from weight
let code_len = (max_bits + 1 - weight) as usize;

FSE Encoding Details

Finite State Entropy (FSE) is an ANS-based entropy coder:

State Machine

pub struct FseEncoder {
    symbol_table: Vec<Vec<FseEncodeEntry>>,
    state: usize,
    accuracy_log: u8,
}

Each encoding step:

  1. Look up entry for current symbol
  2. Output bits from current state
  3. Transition to next state

Reachability-Aware Encoding

For predefined tables, not all state transitions are valid. The encoder checks reachability:

let is_reachable = match transition_type {
    OffsetValue => offset_entries.iter().any(|e| e.baseline <= value && ...),
    LiteralLength => ll_entries.iter().any(|e| e.baseline <= value && ...),
    MatchLength => ml_entries.iter().any(|e| e.baseline <= value && ...),
};

Compression Ratios

Benchmark results comparing haagenti-zstd to zstd (C reference):

Data Type Size Haagenti zstd (C) Delta
Text 1KB 3.31x 3.15x +0.16x
Text 10KB 4.52x 5.02x -0.50x
Text 100KB 4.72x 5.64x -0.92x
Binary 1KB 3.97x 3.67x +0.30x
Binary 100KB 360.56x 371.01x -10.45x
Repeated 100KB 1575x 1796x -221x
RLE (uniform) 100KB 1575x+ 1796x+ Close

Key achievements:

  • Beats reference implementation on small text (1KB: 3.31x vs 3.15x)
  • Close to reference on most patterns
  • All data types decompress correctly

Block Types

Zstd uses three block types:

Type Code Description
Raw 0 Uncompressed data
RLE 1 Single byte repeated
Compressed 2 Huffman + sequences

The encoder automatically selects the best block type:

  1. Uniform data → RLE block
  2. Compresses well → Compressed block
  3. Expansion → Raw block

Frame Format

┌─────────────────────────────────────────────────────────┐
│ Magic Number (4 bytes): 0xFD2FB528                      │
├─────────────────────────────────────────────────────────┤
│ Frame Header (2-14 bytes)                               │
│   - Descriptor (1 byte)                                 │
│   - Window Descriptor (0-1 bytes)                       │
│   - Dictionary ID (0-4 bytes)                           │
│   - Frame Content Size (0-8 bytes)                      │
├─────────────────────────────────────────────────────────┤
│ Block 1                                                 │
│   - Header (3 bytes): last, type, size                  │
│   - Data (variable)                                     │
├─────────────────────────────────────────────────────────┤
│ Block N (last=1)                                        │
├─────────────────────────────────────────────────────────┤
│ Checksum (0-4 bytes, optional)                          │
└─────────────────────────────────────────────────────────┘

Known Limitations

  1. Not Zstd-compatible: This is an internal format. Data compressed with haagenti-zstd cannot be decompressed with standard Zstd tools, and vice versa. The format was modified to optimize for tensor compression patterns.

  2. FSE Custom Tables: Only predefined tables are supported. Custom table serialization is not yet implemented.

  3. Dictionary Support: Dictionary compression is not implemented.

  4. Symbol Limit: Huffman encoder uses direct format, limited to 127 unique symbols. FSE-compressed weight format would support more.

  5. RLE-like Patterns: Data with varying run lengths (e.g., "aaabbbccc...") may not achieve optimal compression due to predefined table limitations.

Testing

# Run all tests (258 tests)
cargo test -p haagenti-zstd

# Run with output
cargo test -p haagenti-zstd -- --nocapture

# Run benchmarks
cargo bench -p haagenti-zstd --bench zstd_benchmark

Implementation Notes

Critical Insights

  1. Huffman Normalization Iterations: Large data (100KB+) with many symbols at similar frequencies requires ~1000 iterations for weight normalization to converge. The original 100 iterations was insufficient.

  2. Kraft Sum Calculation: The encoder uses sum(2^w) internally, but the decoder expects sum(2^(w-1)) = 2^(max_weight-1). This works because normalization naturally reduces max_weight by 1 during weight adjustment.

  3. Bitstream Direction: Huffman streams are read backwards from a sentinel bit. Symbols must be encoded in reverse order for correct decoding.

  4. FSE State Transitions: Predefined tables have specific state transition patterns. The encoder must verify reachability before attempting FSE encoding.

Code Quality

  • 258 tests covering all components
  • Roundtrip tests for all data patterns
  • Integration tests with embedded test vectors
  • Property-based tests with proptest

References

The following resources informed this implementation, though haagenti-zstd diverges from standard Zstd for optimization purposes:

License

See the workspace LICENSE file.

Dependencies

~0.3–1.2MB
~25K SLoC