4 stable releases
1.1.1 | Dec 28, 2024 |
---|---|
1.1.0 | Dec 27, 2024 |
1.0.1 | Dec 11, 2024 |
1.0.0 | Sep 7, 2024 |
#175 in Hardware support
416 downloads per month
Used in aws-smithy-checksums
170KB
5.5K
SLoC
crc64fast-nvme
SIMD-accelerated carryless-multiplication CRC-64/NVME checksum computation
(similar to crc32fast and forked from crc64fast which calculates CRC-64/XZ [a.k.a CRC-64/GO-ECMA
]).
CRC-64/NVME
comes from the NVM Express® NVM Command Set Specification (Revision 1.0d, December 2023) and has also been implemented in the Linux kernel (where it's called CRC-64/Rocksoft
) and is AWS S3's recommended checksum option as CRC64-NVME
. (Note that the Check value in the spec uses incorrect endianness [Section 5.2.1.3.4, Figure 120, page 83]).
SIMD-accelerated carryless-multiplication is based on the Intel Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction paper.
Changes
See CHANGELOG.
Changes from crc64fast
Primarily changes the CRC-64/XZ
(aka CRC-64/GO-ECMA
) polynomial from crc64fast (which uses the ECMA-182
polynomial 0x42F0E1EBA9EA3693
) to use the NVME
polynomial (0xAD93D23594C93659
), plus re-calculates the input parameters (tables, keys, mu, and reciprocal polynomial) for fast operations.
Usage
Rust
use crc64fast_nvme::Digest;
let mut c = Digest::new();
c.write(b"hello ");
c.write(b"world!");
let checksum = c.sum64();
assert_eq!(checksum, 0xd9160d1fa8e418e3);
C-compatible shared library
cargo build
will produce a shared library target (.so
on Linux, .dll
on Windows, .dylib
on macOS, etc) and crc64vnme.h
header file for use in non-Rust projects, such as through FFI.
There is a crc-fast-php library using it with PHP, for example.
/** \FFI $ffi */
$digest = $ffi->digest_new();
$ffi->digest_write($digest, 'hello world!', 12);
$checksum = $ffi->digest_sum64($digest); // 0xd9160d1fa8e418e3
CLI example
A simple CLI implementation can be found in crc_64_nvme_checksum.rs, which will calculate the CRC-64/NVME
checksum for a file on disk.
Other CRC-64 implementations
Tooling to re-calculate input parameters for other CRC-64
implementations/polynomials is supplied in src\bin.
Performance
crc64fast-nvme
provides two fast implementations, and the most performance one will
be chosen based on CPU feature at runtime.
- a fast, platform-agnostic table-based implementation, processing 16 bytes at a time.
- a SIMD-carryless-multiplication based implementation on modern processors:
- using PCLMULQDQ + SSE 4.1 on x86/x86_64
- using PMULL + NEON on AArch64 (64-bit ARM)
Algorithm | Throughput (x86_64) | Throughput (aarch64) |
---|---|---|
crc 3.0.1 | 0.5 GiB/s | 0.3 GiB/s |
crc64fast-nvme (table) | 2.3 GiB/s | 1.8 GiB/s |
crc64fast-nvme (SIMD) | 28.2 GiB/s | 20.0 GiB/s |
crc64fast-nvme (VPCLMULQDQ) | 52 GiB/s | n/a |
Experimental "Vector Carry-Less Multiplication of Quadwords" (VPCLMULQDQ) support
Using Rust's support for AVX512 intrinsics, specifically VPCLMULQDQ, we can massively improve throughput for x86_64 processors which support them (Intel Ice Lake+ and AMD Zen4+).
Specifically, on an m7i.8xlarge
EC2 instance (4th gen Xeon, aka Sapphire Rapids), throughput approximately doubles from ~26GiB/s to ~52GiB/s.
Since these are currently marked as unstable features in Rust, you'll need to build with nightly
and enable the vpclmulqdq
feature:
rustup toolchain install nightly
cargo +nightly build --features="vpclmulqdq" -r
References
- crc32-fast - Original
crc32
implementation in Rust. - crc64-fast - Original
CRC-64/XZ
implementation in Rust (from which this project was forked). - Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction - Intel's paper.
- NVM Express® NVM Command Set Specification - The NVMe spec, including
CRC-64-NVME
(with incorrect endian Check value). - CRC-64/NVME - The
CRC-64/NVME
quick definition. - Linux implementation - Linux implementation of
CRC-64/NVME
. - C++ artifacts implementation - Inspiration C++ for the Rust code in calculate_pclmulqdq_artifacts.rs.
- Intel isa-l GH issue #88 - Additional insight into generating artifacts.
- StackOverflow PCLMULQDQ CRC32 answer - Insightful answer to implementation details for CRC32.
- StackOverflow PCLMULQDQ CRC32 question - Insightful question & answer to CRC32 implementation details.
- AWS S3 announcement about CRC64-NVME support
- AWS S3 docs on checking object integrity using CRC64-NVME
- Vector Carry-Less Multiplication of Quadwords (VPCLMULQDQ) details
License
crc64fast-nvme
is dual-licensed under
- Apache 2.0 license (LICENSE-Apache or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)
Dependencies
~0.1–1.5MB
~22K SLoC