#crc #simd #nvme #crc64 #checksum

bin+lib crc64fast-nvme

SIMD accelerated CRC-64/NVME checksum calculation

4 stable releases

1.1.1 Dec 28, 2024
1.1.0 Dec 27, 2024
1.0.1 Dec 11, 2024
1.0.0 Sep 7, 2024

#175 in Hardware support

Download history 31/week @ 2024-09-24 45/week @ 2024-10-01 41/week @ 2024-10-08 33/week @ 2024-10-15 3/week @ 2024-10-29 8/week @ 2024-11-05 7/week @ 2024-11-12 33/week @ 2024-11-19 83/week @ 2024-11-26 62/week @ 2024-12-03 229/week @ 2024-12-10 11/week @ 2024-12-17 241/week @ 2024-12-24 53/week @ 2024-12-31 105/week @ 2025-01-07

416 downloads per month
Used in aws-smithy-checksums

MIT/Apache

170KB
5.5K SLoC

crc64fast-nvme

Build status Latest Version Documentation

SIMD-accelerated carryless-multiplication CRC-64/NVME checksum computation (similar to crc32fast and forked from crc64fast which calculates CRC-64/XZ [a.k.a CRC-64/GO-ECMA]).

CRC-64/NVME comes from the NVM Express® NVM Command Set Specification (Revision 1.0d, December 2023) and has also been implemented in the Linux kernel (where it's called CRC-64/Rocksoft) and is AWS S3's recommended checksum option as CRC64-NVME. (Note that the Check value in the spec uses incorrect endianness [Section 5.2.1.3.4, Figure 120, page 83]).

SIMD-accelerated carryless-multiplication is based on the Intel Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction paper.

Changes

See CHANGELOG.

Changes from crc64fast

Primarily changes the CRC-64/XZ (aka CRC-64/GO-ECMA) polynomial from crc64fast (which uses the ECMA-182 polynomial 0x42F0E1EBA9EA3693) to use the NVME polynomial (0xAD93D23594C93659), plus re-calculates the input parameters (tables, keys, mu, and reciprocal polynomial) for fast operations.

Usage

Rust

use crc64fast_nvme::Digest;

let mut c = Digest::new();
c.write(b"hello ");
c.write(b"world!");
let checksum = c.sum64();
assert_eq!(checksum, 0xd9160d1fa8e418e3);

C-compatible shared library

cargo build will produce a shared library target (.so on Linux, .dll on Windows, .dylib on macOS, etc) and crc64vnme.h header file for use in non-Rust projects, such as through FFI.

There is a crc-fast-php library using it with PHP, for example.

/** \FFI $ffi */

$digest = $ffi->digest_new();
$ffi->digest_write($digest, 'hello world!', 12);
$checksum = $ffi->digest_sum64($digest); // 0xd9160d1fa8e418e3

CLI example

A simple CLI implementation can be found in crc_64_nvme_checksum.rs, which will calculate the CRC-64/NVME checksum for a file on disk.

Other CRC-64 implementations

Tooling to re-calculate input parameters for other CRC-64 implementations/polynomials is supplied in src\bin.

Performance

crc64fast-nvme provides two fast implementations, and the most performance one will be chosen based on CPU feature at runtime.

  • a fast, platform-agnostic table-based implementation, processing 16 bytes at a time.
  • a SIMD-carryless-multiplication based implementation on modern processors:
    • using PCLMULQDQ + SSE 4.1 on x86/x86_64
    • using PMULL + NEON on AArch64 (64-bit ARM)
Algorithm Throughput (x86_64) Throughput (aarch64)
crc 3.0.1 0.5 GiB/s 0.3 GiB/s
crc64fast-nvme (table) 2.3 GiB/s 1.8 GiB/s
crc64fast-nvme (SIMD) 28.2 GiB/s 20.0 GiB/s
crc64fast-nvme (VPCLMULQDQ) 52 GiB/s n/a

Experimental "Vector Carry-Less Multiplication of Quadwords" (VPCLMULQDQ) support

Using Rust's support for AVX512 intrinsics, specifically VPCLMULQDQ, we can massively improve throughput for x86_64 processors which support them (Intel Ice Lake+ and AMD Zen4+).

Specifically, on an m7i.8xlarge EC2 instance (4th gen Xeon, aka Sapphire Rapids), throughput approximately doubles from ~26GiB/s to ~52GiB/s.

Since these are currently marked as unstable features in Rust, you'll need to build with nightly and enable the vpclmulqdq feature:

rustup toolchain install nightly
cargo +nightly build --features="vpclmulqdq" -r

References

License

crc64fast-nvme is dual-licensed under

Dependencies

~0.1–1.5MB
~22K SLoC