1 unstable release
0.1.0 | Oct 14, 2024 |
---|
#1231 in Algorithms
88KB
1K
SLoC
ChaCha8Rand Implementation in Rust
Reproducible, robust and (last but not least) fast pseudorandomness.
This crate implements the chacha8rand specification, originally designed
for Go's math/rand/v2
package. The language-independent specification and test
vector helps with long-term reproducibility and interoperability. Building on
the ChaCha8 stream cipher ensures high statistical quality and removes entire
classes of "you're holding it wrong"-style problems that lead to sub-par output.
It's also carefully designed and implemented (using SIMD instructions when
available) to be so fast that it shouldn't ever be a bottleneck. However, it
should not be used for cryptography.
See the documentation for more details.
Dual-licensed under Apache 2.0 or MIT at your option.
lib.rs
:
Reproducible, robust and (last but not least) fast pseudorandomness.
This crate implements the ChaCha8Rand specification, originally designed for Go's
math/rand/v2
package. The language-independent specification and test vector helps with
long-term reproducibility and interoperability. Building on the ChaCha8 stream cipher ensures
high statistical quality and removes entire classes of "you're holding it wrong"-style problems
that lead to sub-par output. It's also carefully designed and implemented (using SIMD
instructions when available) to be so fast that it shouldn't ever be a bottleneck. However, it
should not be used for cryptography.
Quick Start
In the interest of simplicity and reproducibility, there's no global or thread-local generator.
You'll always have to pick a 32-byte seed yourself, create a ChaCha8Rand
instance from it,
and pass it around in your program. Usually, you'll generate an unpredictable seed at startup by
default, but store or log it somewhere and support running the program again with the same seed.
For the first half, it's usually best to provide a full 256 bits of entropy via the
getrandom
crate:
use chacha8rand::ChaCha8Rand;
let mut seed = [0; 32];
getrandom::getrandom(&mut seed).expect("getrandom failure is 'highly unlikely'");
let mut rng = ChaCha8Rand::new(&seed);
// Now we can make random choices
let heads_or_tails = if rng.read_u32() & 1 == 0 { "heads" } else { "tails" };
println!("The coin came up {heads_or_tails}.");
The best place and format to store the seed will vary, but 64 hex digits is a good default because it can be copied and pasted as (technically) human-readable text. However, if you want to let humans pick a seed by hand for any reason, then asking them for exactly 64 hex digits would be a bit rude. For such cases, it's more convenient to accept an UTF-8 string and feed it into a hash function with 256 bit output, such as SHA-256 or Blake3.
In any case, once you've created a ChaCha8Rand
instance with an initial seed, you can
consume its output as a sequence of bytes or as stream of 32-bit or 64-bit integers. If you need
support for other types, for integers in a certain interval, or other distributions, you might
want to enable the crate feature to combine ChaCha8Rand
with the rand
crate. Another thing you can do (even without rand
) is deriving seeds for multiple sub-RNGs
that are used for different purposes, without creating correlation between those different
streams of randomness. The ability to do this with confidence is one reason why I decided to
implement ChaCha8Rand in the first place, so there's a little helper for it:
use chacha8rand::ChaCha8Rand;
let mut seed_gen = ChaCha8Rand::new(b"ABCDEFGHIJKLMNOPQRSTUVWXYZ123456");
// Create new instances with seeds from `seed_gen`...
let mut rng1 = ChaCha8Rand::new(&seed_gen.read_seed());
let mut rng2 = ChaCha8Rand::new(&seed_gen.read_seed());
assert_ne!(rng1.read_u64(), rng2.read_u64());
// ... and/or re-seed an existing instance in-place:
rng1.set_seed(&seed_gen.read_seed());
Note that using the output of a statistical RNG to seed other instances of the same algorithm (or a related one) is often risky or outright broken. Even generators that explicitly support it, like SplitMix, often distinguish "generate a new seed" from ordinary random output. ChaCha8Rand has no such caveats: its state space is so large, and its output is of such high quality, that there's no risk of creating overlapping output sequences or correlations between generators seeded this way. Indeed, every instance regularly replaces its current seed with some of its own output. Using the rest of the output as seeds for other instances works just as well.
Don't Use This For Cryptography
ChaCha8Rand derives its high quality from ChaCha8, which is a secure stream cipher as far as
anyone knows today (although in most cases you also want ciphertext authenticity, i.e., an AEAD
mode). Thus, ChaCha8Rand can mostly be used as a black-box source of high quality
pseudorandomness. If there were any patterns or biases in its output, or if the output sequences
for different seeds (with some known relation between them) were not statistically independent,
that would most likely imply a major breakthrough in the cryptanalysis of ChaCha. However, that
doesn't mean this crate is a replacement for cryptographically secure randomness from the
operating system or libraries that wrap it, such as getrandom
.
As Russ Cox and Filippo Valsorda wrote while introducing the algorithm, regarding
accidental use of Go's math/rand
to generate cryptographic keys and other secrets:
Using Go 1.20, that mistake is a serious security problem that merits a detailed investigation to understand the damage. [...] Using Go 1.22, that mistake is just a mistake. It’s still better to use crypto/rand, because the operating system kernel can do a better job keeping the random values secret from various kinds of prying eyes, the kernel is continually adding new entropy to its generator, and the kernel has had more scrutiny. But accidentally using math/rand is no longer a security catastrophe.
Keep in mind that Go has a global generator which is seeded from OS-provided entropy on startup. If you pick a seed yourself (which you always do when using this crate), the output of the generator is at best as unpredictable as that seed was. There are also other design decisions in this implementations that would be inappropriate for security-sensitive applications. For example, it doesn't handle process forking or VM image cloning, it doesn't even try to scrub generated data from its internal buffer after it's consumed, and it sacrifices so-called fast key erasure in favor of needing fewer bytes to serialize the current state.
Crate Features
The crate is no_std
and "no alloc
" by default. There are currently two crate features you
might enable when depending on chacha8rand
. You can manually add them to Cargo.toml (features = [...]
key) or use a command like cargo add chacha8rand -F rand_core_0_6
. The features are:
std
: opts out of#![no_std]
, enables runtime detection oftarget_feature
s for higher performance on some targets. It does not (currently) affect the API surface, so ideally libraries leave this decision to the top-level binary. For forward compatibility, enabling this feature always adds a dependency onstd
, even on targets wherestd
isn't needed today.rand_core_0_6
: implement theRngCore
andSeedableRng
traits fromrand_core
v0.6, for integration withrand
v0.8. The upcoming v0.9 release of the rand crates will get another feature so thatChaCha8Rand
can implement both the new and the old versions of these traits at the same time.
Neither feature is enabled by default, so you don't need no-default-features = true
/ cargo add --no-default-features
. In fact, please don't, because then your code might break if a later
version moves existing functionality under a new on-by-default feature.
There are also some features with an "unstable" prefix in their name. Anything covered by these is for internal use only (e.g., the crate's benchmarks are compiled as a separate crate) and explicitly not covered by SemVer.
Minimum Supported Rust Version (MSRV)
There is no MSRV policy at the moment, so features from new stable Rust versions may be adopted as soon as they come out (but in practice I don't expect to make frequent releases). If you need to use this crate with a specific older version, you can open an issue and we can take a look at how easy or difficult it would be to support that version.
Drawbacks
The main reasons why you might not want to use this crate are the use of unsafe
for accessing
SIMD intrinsics and the relatively large buffer (4x larger than the Go implementation). The
latter means each RNG instance is a little over a thousand bytes large, which may be an issue if
you want to have many instances and care about memory consumption and/or only consume a small
amount of randomness from most of those instances.