#game-cube #wii #yaz0 #file-format #yay0

szs

Lightweight crate for SZS ("Yaz0") compression/decompression used in the Nintendo GameCube/Wii games. The library provides C, C++ and C# bindings. YAY0 ("SZP") is supported, too.

22 unstable releases (3 breaking)

1.1.0 Oct 17, 2023
0.3.7 Jul 20, 2024
0.3.4 Nov 16, 2023

#22 in Games

MIT license

465KB
2.5K SLoC

Rust 1.5K SLoC // 0.0% comments C++ 784 SLoC // 0.0% comments Python 145 SLoC // 0.1% comments C# 133 SLoC // 0.1% comments Visual Studio Solution 31 SLoC

Contains (Windows DLL, 170KB) py/szs.dll, (Windows DLL, 170KB) c#/bindings/szs.dll

crates.io docs.rs

szs

Lightweight crate for SZS ("Yaz0") compression/decompression used in the Nintendo GameCube/Wii games. The library provides C, C++, C#, and WIP Python bindings. YAY0 ("SZP") is supported, too.

Rust

The following snippet demonstrates how to compress a file as a SZS format using Rust:

let src_data: Vec<u8> = "Hello, World!".as_bytes().to_vec();

match szs::encode(&src_data, szs::EncodeAlgo::Nintendo) {
    Ok(encoded_data) => {
        println!("Encoded into {} bytes", encoded_data.len());
    }
    Err(err) => {
        println!("Encoding failed: {}", err);
    }
}

And similarly, to decompress:

match szs::decode(&encoded_data) {
    Ok(decoded_data) => {
        println!("Decoded {} bytes", decoded_data.len());
    }
    Err(err) => {
        println!("Decoding failed: {}", err);
    }
}

C# Bindings

The following C# bindings are provided:

public static void Main(string[] args)
{
    byte[] data = ...;
    szs.CompressionAlgorithm algorithm = szs.CompressionAlgorithm.Nintendo;
    try
    {
        byte[] encodedData = szs.Encode(data, algorithm);
        Console.WriteLine($"Encoded {encodedData.Length} bytes.");
    }
    catch (Exception e)
    {
        Console.WriteLine("Failed to compress: " + e.Message);
    }
}
Warning: szs has a portion implemented in C, which brings its own security considerations.

Algorithms

Algorithm Use Case Desc
EncodeAlgo::Nintendo Matching decomp projects This is the Mario Kart Wii compression algorithm reverse-engineered. In practice it's a Boyer-moore-horspool search with a second opinion mechanism.
EncodeAlgo::Mk8 General FAST preset. This is the Mario Kart 8 compression algorithm reverse-engineered. In practice it's a sliding Monte Carlo hash table. (Credit @aboood40091, @KillZGaming)
EncodeAlgo::MkwSp MKW-SP
EncodeAlgo::CTGP CTGP work CTGP (Reverse engineered. 1:1 matching)
EncodeAlgo::WorstCaseEncoding INSTANT preset. Worst case
EncodeAlgo::Haroohie Haroohie (credit @Gericom, adapted from MarioKartToolbox)
EncodeAlgo::CTLib CTLib (credit @narahiero, adapted from CTLib)
EncodeAlgo::LibYaz0 ULTRA preset. libyaz0 (Based on wszst. credit @aboood40091)

Generally, the mk8 algorithm gets acceptable compression the fastest. For cases where filesize matters, lib-yaz0 ties wszst ultra for the smallest filesizes, while being ~25% faster.

Comparison to Other Libraries:

  1. yaz0-rs

    • Performance: EncodeAlgo::LibYaz0 offers superior compression and is approximately 6x faster on reference data compared to yaz0-rs.
    • Note: szs has a portion implemented in C, which brings its own security considerations.
  2. oead

    • Performance: EncodeAlgo::MK8 matches the compression and speed of oead.
    • Size: szs is a lightweight few-kilobyte MIT licensed dependency, while oead is a larger multi-megabyte GPL licensed package.
  3. Wiimm's SZS Tools

    • Performance:
      • EncodeAlgo::LibYaz0 provides equivalent compression to wszst ultra but is about 30% faster and not restricted by the GPL license.
      • EncodeAlgo::MK8 outperforms wszst fast in compression and is 4-5 times faster.

Special Feature: Among the libraries listed, only szs offers comprehensive support for the YAZ0, YAZ1, and YAY0 stream formats.

Benchmarks

Large file comparison

NSMBU 8-43 (63.9 MB decompressed)

Method Time (Avg 3 runs) Compression Rate File Size
worst-case-encoding 0.03s 112.50% 71.90 MB
mk8 1.37s 29.43% 18.81 MB
ct-lib 3.01s 29.74% 19.01 MB
haroohie 5.79s 29.74% 19.01 MB
ctgp 9.23s 40.91% 26.14 MB
lib-yaz0 16.09s 29.32% 18.74 MB
mkw-sp 36.77s 29.74% 19.01 MB
mkw 55.00s 29.40% 18.79 MB
mkw (C++) 63.34s 29.40% 18.79 MB
Comparison with other libraries:
oead default 0.61s 30.09% 19.23 MB
oead max level 0.99s 29.96% 19.15 MB
wszst fast 1.77s 35.62% 22.76 MB
wszst standard 11.95s 29.74% 19.01 MB
wszst ultra 25.06s 29.32% 18.74 MB

* Average of 3 runs; x64 Clang (17.0.6) build tested on an Intel i9-13900KF on Windows 11

Small file comparison

Task: Compress N64 Bowser Castle (Source filesize: 2,574,368)

Method Time (Avg 3 runs) Compression Rate File Size
worst-case-encoding 0.00s 112.50% 2.76 MB
mk8 0.07s 56.75% 1.39 MB
ct-lib 0.19s 57.24% 1.41 MB
ctgp 0.21s 71.41% 1.75 MB
haroohie 0.31s 57.23% 1.41 MB
lib-yaz0 1.09s 56.65% 1.39 MB
mkw-sp 1.47s 57.23% 1.41 MB
mkw 3.91s 56.87% 1.40 MB
mkw (C++) 4.27s 56.87% 1.40 MB
Comparison with other libraries:
oead default 0.03s 57.63% 1.41 MB
oead max level 0.05s 57.52% 1.41 MB
wszst (fast) 0.197s (via shell) 65.78% 1.61MB
wszst (standard) 0.946 (via shell) 57.23% 1.40MB
wszst (ultra) 1.418s (via shell) 56.65% 1.38MB
yaz0-rs 4.88s (via shell) 56.87% 1.39MB

* Average of 3 runs; x64 Clang (17.0.6) build tested on an Intel i9-13900KF on Windows 11

Generally, the mk8 algorithm gets acceptable compression the fastest. For cases where filesize matters, lib-yaz0 ties wszst ultra for the smallest filesizes, while being ~25% faster.

(Windows) Performance Comparison: Clang vs. MSVC

On Windows, Microsoft's compiler (MSVC) appears to fall behind Clang for most algorithms by a non-trivial margin:

Method Clang Time (s) MSVC Time (s) Performance Uplift (%)
lib-yaz0 15.24 19.08 -25.20%
mkw 62.04 58.34 5.96%
mkw-sp 26.73 50.01 -87.09%
haroohie 5.84 5.85 -0.17%
ct-lib 2.91 2.81 3.44%
mk8 1.34 1.62 -20.90%
ctgp 5.22 5.88 -12.64%

* Average of 3 runs; x64 MSVC build tested on an Intel i9-13900KF on Windows 11

Average Performance Uplift: -19.51%

Recommendation

Based on the performance results, Clang is generally preferred. To set Clang as the compiler for szs, run the following command before cargo build:

SET CXX=clang

Additionally, using a compatible Clang/Rust version will allow for cross-language LTO optimizations.

Example (C Bindings)

#include "szs.h"

// Calculate the upper bound for encoding.
u32 max_size = riiszs_encoded_upper_bound(sizeof(data));

// Allocate a buffer based on the calculated upper bound.
void* encoded_buf = malloc(max_size);
if (!buf) {
	fprintf(stderr, "Failed to allocate %u bytes.\n", max_size);
	return -1;
}

// Boyer-Moore-horspool variant
u32 algorithm = RII_SZS_ENCODE_ALGO_NINTENDO;

u32 actual_len = 0;
const char* ec = riiszs_encode_algo_fast(encoded_buf, max_size, data, sizeof(data), &actual_len, algorithm);
if (ec != NULL) {
	fprintf(stderr, "Failed to compress: %s\n", ec);
	riiszs_free_error_message(ec);
	return -1;
}
printf("Encoded %u bytes.\n", actual_len);
// Optionally: shrink the dst_data to the actual size.
encoded_buf = realloc(encoded_buf, actual_len);

C++ Wrapper on top of C Bindings

A CMake example is provided, too.

#include `szs.h`

// Boyer-Moore-horspool variant
szs::Algo algorithm = szs::Algo::Nintendo;
auto encoded = szs::encode(data, algorithm);
if (!encoded)
	std::println(stderr, "Failed to compress: {}.", encoded.error()); {
	return -1;
}
std::vector<u8> szs_data = *encoded;
std::println("Encoded {} bytes.", szs_data.size());

License

This library is published under the MIT license.

Dependencies