#little-endian #endian #binary-data #big-endian #byteorder #networking

simple_endian

A create for defining endianness within your data structures, to make handling portable data structures simpler

12 releases

0.3.2 Mar 9, 2024
0.3.1 Mar 4, 2024
0.2.1 Sep 21, 2021
0.2.0 Aug 31, 2020
0.1.6 Dec 30, 2019

#132 in Data structures


Used in 2 crates

MIT license

57KB
1K SLoC

simple-endian

As of the 0.3 release, this library works on stable Rust.

Yet another library for handling endian in Rust. We use Rust's type system to ensure correct conversions and build data types with explicit endianness defined for more seamless portability of data structures between processor types. It should be fairly lightweight, and supports #![no_std].

The key difference between this crate and other crates for handling endian is that in this crate, you aren't doing conversions manually at all. You are just working in the endian that is appropriate to the data structures that you're dealing with, and we try to provide the needed traits and methods to do this in as normal a way as possible.

Isn't there already a library for this

Yes, there are several. But I'm not entirely happy with any of them. Specifically, most of the libraries out there right now focus on providing functions for doing endian conversions. Here are a few of them:

byteorder has over 11 million downloads, and is clearly the prevailing way to handle endian in Rust. However, it relies on programmers writing specific logic to swap bytes and requires accessing memory in ways that are unlike normal patterns for Rust. But really, the only difference between a big- and little-endian value is the interpretation. It shouldn't require a drastically different pattern of code to access them.

So, why create another one

Because I think a better approach is to define your endianness as part of your data definition rather than in the logic of your program, and then to make byte order swaps as transparent as possible while still ensuring correctness. And because the more like normal Rust data types and operations this is, the more likely it is that people will write portable code and data structures in the first place.

The philosophy of this crate is that you define your endian when you write your data structures, and then you use clear, imperative logic to mutate it without needing to think about the details or the host endian. This makes it fundamentally different from crates that just give you a way to read a &[u8; 8] into a u64.

Goals of this project

The goals of this crate are as follows:

  1. Safely provide specific-endian types with low or no runtime overhead. There should be no runtime penalty when the host architecture matches the specified endianness, and very low penalty loads and stores otherwise.
  2. Straightforward, architecture-independent declarative syntax which ensures that load and store operations as correct.
  3. Ergonomic use patterns that maximize clarity and convenience without sacrificing correctness safety or correctness.
  4. Incorrect handling of data should generate clear type errors at compile time.
  5. Determination of correct endianness should be at declaration, and should not need to be repeated unless converting to a different endianness.
  6. Support for all or Rust's built-in types where endianness is relevant.
  7. The only dependency needed is the core crate. The std crate is used, however, for tests and benchmarks.

Examples

use simple_endian::*;

let foo: u64be = 4.into();    //Stores 4 in foo in big endian.

println!("raw: {:x}, value: {:x}", foo.to_bits(), foo);

The output will depend on what sort of computer you're using. If you're running a little-endian system, such as x86 (PCs, Macs, etc.), you will see the big endian representation interpreted as if little-endian, as it's stored in memory. Note that the ``.to_bits()` method is mostly there for debugging purposes, and should not be used often.

This works in reverse as well:

use simple_endian::*;

let foo: u64be = 4.into();
let bar = u64::from(foo);

println!("value: {:x}", bar);

If you prefer, there's a convenience method so that you don't need to explicitly convert back to the basic native type.

use simple_endian::*;

let foo: u64be = 4.into();
let bar = foo.to_native();

println!("value: {:x}", bar);

And the type system ensures that native-endian values are never written without being converted into the proper endian.

let mut foo: u64be = 4.into();
foo = 7;     // Will not compile without .into().

How it works

At its core, this crate centers around one trait, called SpecificEndian<T>, and the generic structs BigEndian<T> and LittleEndian<T>. SpecificEndian<T> is required to make BigEndian<T> and LittleEndian<T> structs. Any data type that implements SpecificEndian, even if it handles endianness in unusual ways, can be assigned BigEndian and LittleEndian variants using the structs in this crate, the main possibly limitation being that they need to use the same underlying structure. In fact, u64be is just a type alias for BigEndian<u64>. There is no memory footprint added by the BigEndian<T> and LittleEndian<T> structs, in fact, in most cases it uses the type T to store the data. The only purpose of the structs is to tag them for Rust's type system to enforce correct accesses. This means that it can be used directly within larger structs, and then the entire struct can be written to disk, send over a network socket, and otherwise shared between processor architectures using the same code regardless of host endian using declarative logic without any conditionals.

This crate provides SpecificEndian implementations for most of the built-in types in Rust, including:

  • Single-byte values (i8, u8, bool), although this really doesn't do much but provide completeness.
  • The multi-byte integers: u16, u32, u64, u128, usize, i16, i32, i64, i128, isize
  • The floats: f32, f64.

At the time of this writing, the only common built-in type that doesn't have an implementation is char, and this is because some values of char that would be possible from the binary representation would cause a panic. Usually, you wouldn't want to store a char directly anyway, so this is probably a small limitation.

This crate also provides implementations of a variety of useful traits for the types that it wraps, including boolean logic implementations for the integer types, including bools. This allows most boolean logic operations to be performed without any endian conversions using ordinary operators. You are required to use same-endian operands, however, like this:

use simple_endian::*;

let ip: BigEndian::<u32> = 0x0a00000a.into();
let subnet_mask = BigEndian::from(0xff000000u32);

let network = ip & subnet_mask;

println!("value: {:x}", network);

As you see, the network is calculated by masking the IP address with the subnet mask in a way that the programmer barely has to think about the conversion operations.

Alternatively, you might want to define a structure with the elements typed so that it can be moved around as a unit.

use simple_endian::*;

#[derive(Debug)]
#[repr(C)]
struct NetworkConfig {
    address: BigEndian<u32>,
    mask: BigEndian<u32>,
    network: BigEndian<u32>,
}

let config = NetworkConfig{address: 0x0a00000a.into(), mask: 0xff000000.into(), network: (0x0a00000a & 0xff000000).into()}

println!("value: {:x?}", config);

Note that the println! will convert the values to native endian.

And finally, this crate implements a number of traits that allow most of the basic arithmetic operators to be used on the Big- and LittleEndian variants of all of the types, where appropriate, including for the floats. There is a certain amount of overhead to this, since each operation requires at least one and often two or more endian conversions, however, since this crate aims to minimize the cost of writing portable code, they are provided to reduce friction to adoption. If you are writing code that is extremely sensitive to such overhead, it might make sense to convert to native endian, do your operations, and then store back in the specified endian using .into() or similar. That said, the overhead is often very small, and Rust's optimizer is very good, so I would encourage you to do some actual benchmarking before taking an unergonomic approach to your code. There are too many traits implemented to list them here, so I recommend consulting the documentation. Alternatively, you could just try what you want to do, and see if it compiles. It shouldn't ever allow you to compile something that doesn't handle endianness correctly unless you work pretty hard at it.

Representations and ABI

You might notice that we used #[repr(C)] in the data struct above, and you might be wondering why. It is often the case that you want to write a struct that has a very specific layout when you are writing structures that will be directly read from and written to some medium. Rust's default ABI does not guarantee this. For that reason, all of the structs defined in this crate are #[repr(transparent)], and it is strongly recommended if you do plan to directly write these structures to disk or the network, that you do something to ensure a consistent layout similar or otherwise guarantee the order in which the fields are stored.

Operations on Types

In addition to offering support for ensuring that correct endianness is used by leveraging the Rust type system, this crate also provides implementations of a number of traits from the core library that allow you to work with values directly without converting them to native endian types first. In many cases, this is literally a zero-cost capability, because bitwise operations are endian-agnostic, and as long as you are using other SpecificEndian types, there is no overhead to doing operations on them directly. In cases where a conversion to native endian is necessary, the crate will perform the conversion, and return a value in the same type as the input.

Features

Although this crate includes a lot of useful functionality up front, including it all can increase your compiled size significantly. For size-conscious applications, I recommend not including everything.

By default, this crate will compile with all supported features. Although this means that in most cases, almost anything you would want to do would work out of the box, in practical terms, this can make the crate rather large. To avoid bloat, it might be best to set default-features = false in your "Cargo.toml", and add back in the features you actually need.

The two most useful features are probably the ones that control support for big- and little- endians:

  • big_endian
  • little_endian

Others are broken into categories:

  • Operations types - These can make the use of SpecificEndian<T> types more ergonimic, and allow for some amount of optimization by avoiding unnecessary convertions to and from native endian.
    • bitwise
    • comparisons
    • math_ops
    • neg_ops
    • shift_ops
  • Support for formatting in the format feature.
  • Support for different types
    • float_impls
    • integer_impls
    • byte_impls

Performance

For the most part, the performance of the endian operations are extremely fast, even compared to native operations. The main exception is the std::fmt implementations, which are in some cases quite a bit slower than default. I'm open to suggestions on how to improve the performance, but it might be worth using .to_native() instead of directly printing the wrapped types in performance-critical contexts.

See Also

This crate allows for the manipulation of specific-endian structures in memory. It does not provide any facility for reading or writing those structures, which would probably be necessary in most use cases. See the following other crates for that functionality:

No runtime deps