#data-structures #byte-sequences #fuzzing #entropy #convert-bytes #arbitrary

no-std entropic

Traits for converting Rust data structures to/from unstructured bytes

1 unstable release

0.1.0 Jul 2, 2024

#566 in Testing

MIT/Apache

65KB
1.5K SLoC

Entropic


Entropic - Easily Convert Rust Structures To/From Unstructured Bytes

The Entropic trait enables the conversion of raw bytes into Rust data structures and back.

Mapping to/from unstructured bytes

entropic srives to provide a few guarantees about the way it maps data:

1. Every data structure value will be mapped to by one or more unstructured byte sequences.

In mathematical terms, this means that the function from unstructured bytes to data structures is surjective. It's not uncommon for multiple different unstructured byte sequences to map to the same data structure value, so it should not be assumed that a different byte input will lead to different output.

Some data structure values may have a slightly greater chance of being chosen than others due to this surjective mapping. Certain schemes provide guaranteed bounds on the probability delta between different values, and the PureRandomScheme aims to provide cryptographically-secure guarantees on each resulting output value having identical odds of occuring given an input; however, these guarantees come at the cost of consuming much more entropy input bytes on average than other schemes.

2. Data structures will map to unstructured bytes that always map back to that same data structure.

In mathematical terms, suppose U is the set of all possible unstructured byte sequences and S is the set of all possible data structure values. entropic provides two functions (from_entropy_source and to_entropy_sink) that can respectively be represented by f: U -> S and g: S -> U. For every value s in S, entropic aims to guarantee that f(g(s)) == s. However, it is not the case that g(f(u)) == u for all u in U--several byte sequences may map to the same data structure value, but that value can only map back to one of those sequences.

This property is particularly useful when using entropic for fuzzing data structures. Fuzzing has been shown to be much more effective across domains when provided with semantically-valid starting seeds that explore interesting paths. As such, entropic allows one to define an expected data structure and convert it to a byte sequence that will subsequently map to that same structure during fuzzing.

Improvements over other libraries

  • Allows converting back into unstructured bytes in a (semi-)bijective manner
  • Guarantees formation of data structure so long as sufficient data is available, regardless of bytes. Other crates (such as arbitrary) require certain structure within raw input (such as syntactically valid UTF-8 for strings) to succeed
  • Modular entropy conversion scheme--if you would rather have data map in a different way, you can choose what entropy scheme maps data into structures, or even implement your own mapping!

Dependencies

~0–310KB