2 releases

0.1.1 Sep 8, 2023
0.1.0 Sep 7, 2023

#2438 in Data structures

MIT license

51KB
761 lines

Justly

This is a library for “justified containers”, inspired by the justified-containers library in Haskell.

Briefly, this library might help you wherever you’re using indices instead of references.

Check out the rustdoc documentation for src/lib.rs for the main description of Justly and information about using it.

Also see the meta/ directory to learn more about the structure of this repo.


lib.rs:

Collections that don’t need extra bounds-checking.

Overview

Justly implements “justified” wrappers for the standard Rust data structures in std::collections. A justified container is one whose keys, such as the usize indices into a [Vec], keep track of their validity. This has a few major consequences, listed below.

Trustworthiness

Justly is built on the idea of trustworthy knowledge: call::Call represents a value called by a name, which lets you encode knowledge about that named thing, such as key::Key, an index that you know is valid. If you uphold the safety properties described in these APIs, then such knowledge is called trustworthy, and so you can rely on it to build safe abstractions. Now unchecked indexing can be well and truly safe!

No extra checks

A “checked” method is one that performs bounds checking, and an “unchecked” method is one that unsafely omits it. Methods in Justly that safely omit bounds checks are called check-free. The Rust compiler attempts to elide checks when it can prove that they will always succeed. Justly implements check-free APIs by giving you the tools to prove that checks can be skipped.

It’s worth noting that some checks are strictly necessary! For example, you must do a runtime test to tell whether an arbitrary integer index is a valid key. Justly only lets you avoid extra checks, which are not strictly necessary, yet the compiler cannot prove it. But once you have gotten a typed key, it proves that the test has already been done, and doesn’t need to be repeated.

Also, most indices that you get from the collection APIs are already known to be valid keys, so Justly merely adds that information in its wrapper APIs.

No invalid indices

When referring to elements in a collection, Rust encourages using indices, relative to the collection, instead of absolute pointers. When a container is mutable, mutating methods can reallocate the container’s storage if its capacity is exceeded, thereby invalidating all pointers to that storage. So relative indexing makes it easier to uphold memory safety compared to absolute addressing, because indices automatically respond to capacity changes.

However, relative indices still don’t automatically handle changes in size: any method that decreases the size of the container must naturally invalidate any indices referring to elements outside of its new bounds.

Worse, since indices are just bare integers, they come with no guarantees about the relationship between the index and the value at that index. For example, suppose we have a vector, v, and we compute some indices into it, i and j.

let mut v = vec![1, 7, 16, 27, 40, 55];

let i = v.iter().enumerate().position(|(k, &v)| k * 10 == v).unwrap();
let j = v.iter().enumerate().position(|(k, &v)| k * 11 == v).unwrap();

println!("{}", v[i]);  // valid (40)
println!("{}", v[j]);  // valid (55)

If we remove() an element, then this can easily break code that makes assumptions about the indices: remove() not only makes j inaccessible, but also silently changes the meaning of i. This is like a use-after-free error.

v.remove(2);

println!("{}", v[i]);  // logic error (55)
println!("{}", v[j]);  // runtime error (panic)

And of course if we push() an element, the problem is compounded: it silently makes j accessible again, but with a different meaning as well.

v.push(68);

println!("{}", v[j]);  // logic error (68)

So Justly wraps those mutating methods that could invalidate the keys of a collection in consuming methods that give back a modified object, along with information about how the new keys are related to the old ones.

We do this with phantom type parameters, so there is no runtime cost.

About this documentation

Wrapper types and methods are annotated with tags that describe their purpose, which are marked with curly braces {} for easier searching.

{free}

Marks methods that used to require bounds-checking on the original type, but are now check-free.

{just}

On a container type, this says that the type is a wrapper for a plain container where some methods use typestate to add safety.

On a method, it marks places where we change the signature, typically either:

  • Taking ownership of self to replace a mut method on the wrapped type with one that tracks knowledge about the container

  • Adding wrappers such as key::Key to record some knowledge about the result

{like}

Links to the original wrapped thing.

{safe}

Marks things that used to be unsafe (usually unchecked) but are now safe (implying check-freedom).

TODO

Allow naming unsized things

call::Call and link::Link ask for T: Sized, but they don’t really need to, since all the other fields are meant to be phantoms.

Allocator support

To support allocators, there should be a type parameter on the wrappers, such as A = std::alloc::Global.

Implicit dereferencing

We’re uneasy about whether to add implementations of Deref and DerefMut that would let a value be implicitly called by a name, like these for Vec:

impl<'name, T> Deref for Vec<'name, T> {
    type Target = Call<'name, vec::Vec<T>>;
    fn deref(&self) -> &Call<'name, vec::Vec<T>> {
        &self.same
    }
}

impl<'name, T> DerefMut for Vec<'name, T> {
    fn deref_mut(
        &mut self,
    ) -> &mut Call<'name, vec::Vec<T>> {
        &mut self.same
    }
}

Tracking capacity

At the time of this writing, these APIs only track the validity of indices. This means that if std::collections has a mutating method that doesn’t change any indices, we can just forward it and keep the same API, which is likely better for usability.

However, in principle we could just as well track the capacity too. This would reflect the std::collections guarantees about allocation at the type level, making it possible in some circumstances to statically guarantee that code will not allocate. But it would come at the ergonomic cost of using a different API for functions that might change the capacity.

Static, type-level names for runtime values.

Names are represented using lifetime variables. For more on why, see [mod@crate::call]. Helper for unique names. name ≡ value relationships: define a unique static name for a dynamic value.

To make a new name, we have to make up a new type-level constant. which is both statically known and unique.

Often a “type constant” wants to be an existential type. However, dyn Trait types aren’t statically known, and impl Trait types aren’t necessarily unique. Happily, we can write rank-1 existential quantification using rank-2 universal quantification, which is allowed for lifetime variables.

So we can write the name as a lifetime. That also lets us skip writing it sometimes, thanks to lifetime elision.

Gives names to values. name × value relationships: pair a static name with a dynamic value. Common properties of collections. {just} {like}:[mod@std::vec] {like}:[mod@std::slice]

Vec<'name, T> is a thin wrapper around a named vector, Call<'name, std::vec::Vec<T>>, which you can make using the crate::called::Called trait.

As is typical for this library, the Vec wrapper replaces some methods with versions that track the state of the container to prove that unchecked indexing is safe, and offer some additional features enabled by this.

The types Slice and MutSlice are similar thin wrappers for slices, and likewise ConstPtr and MutPtr for pointers. {like}:std::collections::vec_deque {like}:std::collections::linked_list {like}:std::collections::hash_map {like}:std::collections::btree_map {like}:std::collections::binary_heap Indices known to be valid keys in collections. Static proofs.

No runtime deps