#u64 #compact #ufotofu-codec #compact-u64

no-std compact_u64

Endian-aware fixed-width integer codecs for ufotofu_codec

1 unstable release

new 0.1.0 Apr 9, 2025

#12 in #u64

MIT/Apache

24KB
338 lines

UFOTOFU Codec Endian

Endian-aware fixed-width integer codecs for ufotofu_codec.


lib.rs:

Compact u64

Compact encodings for unsigned 64-bit integers. The general idea is the following:

  • Each encoding is preceeded by a tag of two to eight (inclusive) bits.
  • Each u64 can be encoded by setting the tag to the greatest possible number and then encoding the u64 as an eight-byte big-endian integer.
  • Each u64 that fits into four bytes can be encoded by setting the tag to the second-greatest possible number and then encoding the u64 as an four-byte big-endian integer.
  • Each u64 that fits into two bytes can be encoded by setting the tag to the third-greatest possible number and then encoding the u64 as an two-byte big-endian integer.
  • Each u64 that fits into one byte can be encoded by setting the tag to the fourth-greatest possible number and then encoding the u64 as an one-byte big-endian integer.
  • If the tag has more than two bits, then each u64 that is less than the fourth-greatest tag can be encoded in the tag directly, followed by no further bytes.

TagWidth is the type of possible tag widths (integers between two and eight inclusive). EncodingWidth is the type of the possible numbers of bytes that are needed for encoding a compact u64 beyond its tag: zero, one, two, four, or eight. EncodingWidth::min_width takes a u64 and a TagWidth, and returns the minimal EncodingWidth for compactly encoding the given number with a tag of the given width.

The [Tag] type represents a tag: its width, together with the actual tag data. Its Tag::from_raw method can be used to create [Tag]s from single bytes when decoding compact u64s.

Finally, the CompactU64 type is the central type of the crate, a wrapper around u64 that allows for encoding and decoding. It implements Encodable for emitting a minimal eight-bit tag followed by the minimal number of bytes for encoding the u64. The Decodable implementation conversely reads a single byte as an eight-bit tag, and then reads further bytes according to the tag to obtain the u64. There is also a DecodableCanonic implementation wihch errors if the decoded bytes are not the shortest possible encoding of the u64.

Further, CompactU64 implements RelativeEncodable for encoding only the number (but not the tag) relative to any given EncodingWidth, and RelativeDecodable for decoding the number (but not a tag) relative to any given [Tag]. The corresponding RelativeDecodableCanonic implementation again enforces minimal encodings.

Dependencies

~0.6–1MB
~23K SLoC