17 releases

0.2.15 Sep 11, 2023
0.2.14 Mar 8, 2023
0.2.13 Dec 27, 2022
0.2.11 Nov 4, 2022
0.2.3 Nov 18, 2020

#119 in Memory management

Download history 2023/week @ 2023-12-11 2461/week @ 2023-12-18 1558/week @ 2023-12-25 1702/week @ 2024-01-01 2483/week @ 2024-01-08 2180/week @ 2024-01-15 2068/week @ 2024-01-22 2350/week @ 2024-01-29 2153/week @ 2024-02-05 2810/week @ 2024-02-12 3904/week @ 2024-02-19 2392/week @ 2024-02-26 2297/week @ 2024-03-04 2518/week @ 2024-03-11 2246/week @ 2024-03-18 2266/week @ 2024-03-25

9,520 downloads per month
Used in 29 crates (11 directly)

MIT/Apache

45KB
973 lines

Heap data estimator.

The datasize crate allows estimating the amount of heap memory used by a value. It does so by providing or deriving an implementation of the DataSize trait, which knows how to calculate the size for many std types and primitives.

The aim is to get a reasonable approximation of memory usage, especially with variably sized types like Vecs. While it is acceptable to be a few bytes off in some cases, any user should be able to easily tell whether their memory is growing linearly or logarithmically by glancing at the reported numbers.

The crate does not take alignment or memory layouts into account, or unusual behavior or optimizations of allocators. It is depending entirely on the data inside the type, thus the name of the crate.

General usage

For any type that implements DataSize, the data_size convenience function can be used to guess the size of its heap allocation:

use datasize::data_size;

let data: Vec<u64> = vec![1, 2, 3];
#[cfg(feature = "std")]
assert_eq!(data_size(&data), 24);

Types implementing the trait also provide two additional constants, IS_DYNAMIC and STATIC_HEAP_SIZE.

IS_DYNAMIC indicates whether a value's size can change over time:

use datasize::DataSize;

#[cfg(feature = "std")]
// A `Vec` of any kind may have elements added or removed, so it changes size.
assert!(Vec::<u64>::IS_DYNAMIC);

// The elements of type `u64` in it are not dynamic. This allows the implementation to
// simply estimate the size as number_of_elements * size_of::<u64>.
assert!(!u64::IS_DYNAMIC);

Additionally, STATIC_HEAP_SIZE indicates the amount of heap memory a type will always use. A good example is a Box<u64> -- it will always use 8 bytes of heap memory, but not change in size:

use datasize::DataSize;

#[cfg(feature = "std")]
assert_eq!(Box::<u64>::STATIC_HEAP_SIZE, 8);
#[cfg(feature = "std")]
assert!(!Box::<u64>::IS_DYNAMIC);

Overriding derived data size calculation for single fields.

On structs (but not enums!) the calculation for heap size can be overriden for single fields, which is useful when dealing with third-party crates whose fields do not implement DataSize by simply annotating it with #[data_size(with = ...)] and pointing to a Fn(T) -> usize function:

use datasize::DataSize;

// Let's pretend this type is from a foreign crate.
struct ThirdPartyType;

fn estimate_third_party_type(value: &Vec<ThirdPartyType>) -> usize {
    // We assume every item is 512 bytes in heap size.
    value.len() * 512
}

#[cfg(feature = "std")]
#[derive(DataSize)]
struct MyStruct {
    items: Vec<u32>,
    #[data_size(with = estimate_third_party_type)]
    other_stuff: Vec<ThirdPartyType>,
}

This automatically marks the whole struct as always dynamic, so the custom estimation function is called every time MyStruct is sized.

Implementing DataSize for custom types

The DataSize trait can be implemented for custom types manually:

struct MyType {
    items: Vec<i64>,
    flag: bool,
    counter: Box<u64>,
}

#[cfg(feature = "std")]
impl DataSize for MyType {
    // `MyType` contains a `Vec`, so `IS_DYNAMIC` is set to true.
    const IS_DYNAMIC: bool = true;

    // The only always present heap item is the `counter` value, which is 8 bytes.
    const STATIC_HEAP_SIZE: usize = 8;

    #[inline]
    fn estimate_heap_size(&self) -> usize {
        // We can be lazy here and delegate to all the existing implementations:
        data_size(&self.items) + data_size(&self.flag) + data_size(&self.counter)
    }
}

let my_data = MyType {
    items: vec![1, 2, 3],
    flag: true,
    counter: Box::new(42),
};

#[cfg(feature = "std")]
// Three i64 and one u64 on the heap sum up to 32 bytes:
assert_eq!(data_size(&my_data), 32);

Since implementing this for struct types is cumbersome and repetitive, the crate provides a DataSize macro for convenience:

// Equivalent to the manual implementation above:
#[cfg(feature = "std")]
#[derive(DataSize)]
struct MyType {
    items: Vec<i64>,
    flag: bool,
    counter: Box<u64>,
}

See the DataSize macro documentation in the datasize_derive crate for details.

Performance considerations

Determining the full size of data can be quite expensive, especially if multiple nested levels of dynamic types are used. The crate uses IS_DYNAMIC and STATIC_HEAP_SIZE to optimize when it can, so in many cases not every element of a vector needs to be checked individually.

However, if the contained types are dynamic, every element must (and will) be checked, so keep this in mind when performance is an issue.

Handlings references, Arcs and similar types

Any reference will be counted as having a data size of 0, as it does not own the value. There are some special reference-like types like Arc, which are discussed below.

Arc and Rc

Currently Arcs are not supported. A planned development is to allow users to mark an instance of an Arc as "primary" and have its heap memory usage counted, but currently this is not implemented.

Any Arc will be estimated to have a heap size of 0, to avoid cycles resulting in infinite loops.

The Rc type is handled in the same manner.

Additional types

Some additional types from external crates are available behind feature flags.

  • fake_clock-types: Support for the fake_instant::FakeClock type.
  • futures-types: Some types from the futures crate.
  • smallvec-types: Support for the smallvec::SmallVec type.
  • tokio-types: Some types from the tokio crate.

no_std support

Although slightly paradoxical due to the fact that without std or at least alloc there won't be any heap in most cases, the crate supports a no_std environment. Disabling the "std" feature (by disabling default features) will produce a version of the crate that does not rely on the standard library. This can be used to derive the DataSize trait for types without boilerplate, even though their heap size will usually be 0.

Arrays and const generics

By default, this crate requires at least Rust version 1.51.0, in order to implement DataSize for [T; N] arrays generically. This implementation is provided by the "const-generics" feature flag, which is enabled by default. In order to use an older Rust version, you can specify default-features = false and features = ["std"] for datasize in your Cargo.toml.

When the const-generics feature flag is disabled, a DataSize implementation will be provided for arrays of small sizes, and for some larger sizes related to powers of 2.

Known issues

The derive macro currently does not support generic structs with inline type bounds, e.g.

struct Foo<T: Copy> { ... }

This can be worked around by using an equivalent where clause:

struct Foo<T>
where T: Copy
{ ... }

Dependencies

~1.1–2.4MB
~50K SLoC