3 unstable releases

new 0.2.0 Dec 10, 2024
0.1.1 Dec 5, 2024
0.1.0 Dec 4, 2024

#367 in Rust patterns

Download history 258/week @ 2024-12-02

258 downloads per month

MIT license

120KB
2K SLoC

shape: a decidable static shape system for JSON

This library implements a Rust-based type system powerful enough to represent any kind of JSON data, offering type-theoretic operations like simplification, satisfaction testing, child shape selection, union and intersection shapes, delayed shape binding, flexible error handling, and more.

[!CAUTION] This library is still in early-stage development, so you should not expect its API to be fully stable until the 1.0.0 release.

Installation

This crate provides a library, so installation means adding it as a dependency to your Cargo.toml file:

cargo add shape

Documentation

See the cargo doc-generated documentation for detailed information about the Shape struct and ShapeCase enum.

The Shape struct

pub(crate) type Ref<T> = std::sync::Arc<T>;

#[derive(Clone, PartialEq, Eq)]
pub struct Shape {
    case: Ref<ShapeCase>,
    case_hash: u64,
}

impl Shape {
    pub(crate) fn new_from_simplified(case: ShapeCase) -> Shape {
        let case = Ref::new(case);
        let case_hash = case.compute_hash();
        Shape { case, case_hash }
    }
}

To support flexible recombinations of shapes and their subshapes, the top-level Shape struct wraps a reference counted ShapeCase enum variant. Reference counting not only simplifies sharing subtrees among different Shape structures, but also prevents rustc from complaining about the Shape struct referring to itself without indirection.

Since this ShapeCase value is immutable, we can precompute and cache its hash in the case_hash field, then manually implement the std::hash::Hash trait to ignore the case field in favor of this cached case_hash field:

impl std::hash::Hash for Shape {
    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
        self.case_hash.hash(state);
    }
}

This allows Shape hashing to reuse previously computed subtree hashes, rather than rehashing the entire tree each time a hash is requested.

Obtaining a Shape

Instead of tracking whether a given ShapeCase has been simplified or not, we can simply mandate that Shape always wraps simplified shapes.

This invariant is enforced by restricting how Shape instances can be (publicly) created: all Shape instances must come either from calling ShapeCase::simplify, or from calling the crate-internal static method Shape::new_from_simplified.

impl ShapeCase {
    pub fn simplify(self) -> Shape {
        // Since this static method is crate-private, the ShapeCase::simplify
        // is the only public way to create a Shape.
        Shape::new_from_simplified(self.simplify_internal())
    }
}

lazy_static! {
    static ref BOOL_SHAPE: Shape = ShapeCase::Bool(None).simplify();
    // ... additional static Shape instances ...
}

impl Shape {
    // For convenience, Shape provides a number of static helper methods like
    // Shape::bool() and Shape::bool_value(true).
    pub fn bool() -> Shape {
        // A common BOOL_SHAPE can be returned whenever Shape::bool() is called.
        // This does not guarantee every Shape representing the Bool type will
        // share the memory of this BOOL_SHAPE, but it helps reduce unnecessary
        // memory allocations.
        BOOL_SHAPE.clone()
    }

    pub fn bool_value(value: bool) -> Shape {
        ShapeCase::Bool(Some(value)).simplify()
    }

    // Additional helpers...
    pub fn int() -> Shape;
    pub fn int_value(value: i64) -> Shape;
    pub fn float() -> Shape;
    pub fn string() -> Shape;
    pub fn string_value(value: &str) -> Shape;
    pub fn null() -> Shape;
    pub fn none() -> Shape;
    pub fn empty_object() -> Shape;
    pub fn empty_map() -> IndexMap<String, Shape>;
    pub fn object(fields: IndexMap<String, Shape>, rest: Option<Shape>) -> Shape;
    pub fn array(prefix: &[Shape], tail: Option<Shape>) -> Shape;
    pub fn tuple(shapes: &[Shape]) -> Shape;
    pub fn list(of: Shape) -> Shape;
    pub fn empty_array() -> Shape;
    pub fn one(shapes: &[Shape]) -> Shape;
    pub fn all(shapes: &[Shape]) -> Shape;
    pub fn error(message: &str) -> Shape;
    pub fn error_with_range(message: &str, range: OffsetRange) -> Shape;
    pub fn error_with_partial(message: &str, partial: Shape) -> Shape;
    pub fn error_with_partial_and_range(message: &str, partial: Shape, range: OffsetRange) -> Shape;
}

Notice that ShapeCase::simplify takes ownership of its input self value. If this is not the behavior you want, you can always clone your ShapeCase value before simplifying it, thereby simplifying only the clone. However, you cannot use an unsimplified ShapeCase as a child of another ShapeCase value, since child shapes are always wrapped with Shape.

Testing supershape-subshape acceptance

To test whether the set of all values accepted by one Shape is a subset of the set of all values accepted by another Shape, use the supershape.accepts(&subshape) -> bool method, or its inverse subshape.satisfies(&supershape) -> bool.

For example, a Shape::one union shape accepts any member shape of the union,

let int_string_union = Shape::one(&[Shape::int(), Shape::string()]);

assert!(int_string_union.accepts(&Shape::int()));
assert!(int_string_union.accepts(&Shape::string()));
assert!(Shape::int().satisfies(&int_string_union));
assert!(Shape::string().satisfies(&int_string_union));

assert_eq!(Shape::int().satisfies(&int_string_union), false);
assert_eq!(Shape::string().satisfies(&int_string_union), false);
assert_eq!(int_string_union.accepts(&Shape::float()), false);

Error satisfaction

A ShapeCase::Error variant generally represents a failure of shape processing, but it can also optionally report Some(partial) shape information in cases when there is a likely best guess at what the shape should be.

For this reason, a ShapeCase::Error shape either satisfies/accepts itself trivially (according to == equality), or it can define a partial shape to satisfy shapes that accept that partial shape.

This partial: Option<Shape> field allows errors to provide guidance (potentially with chains of multiple errors) without interfering with the accepts/satisfies logic.

let error = Shape::error_with_partial("Expected an integer", Shape::int());

assert!(error.accepts(&Shape::int()));
assert_eq!(error.accepts(&Shape::float()), false);

assert!(Shape::int_value(42).satisfies(&error));
assert_eq!(Shape::float().satisfies(&error), false);

The null singleton and the None shape

ShapeCase::Null represents the singleton null value found in JSON. It satisfies and accepts only itself and no other shapes, except unions that allow null as a member, or errors that wrap null as a partial shape.

ShapeCase::None represents the absence of a value, and is often used to represent optional values. Like null, None is satisfied by (accepts) only itself and no other shapes (except unions that include None as a member, or errors that wrap None as a partial shape for some reason).

When either null or None participate in a Shape::one union shape, they are usually preserved (other than being deduplicated) because they represent distinct possibilities. However, ::Null and ::None do have a noteworthy difference of behavior when simplifying ::All intersection shapes.

When null participates in a ShapeCase::All intersection shape, it "poisons" the intersection and causes the whole thing to simplify to null. This allows a single intersection member shape to override the whole intersection, which is useful for reporting certain kinds of error conditions (especially in GraphQL).

By contrast, None does not poison intersections, but is simply ignored. This makes sense if you think of Shape::all intersections as merging their member shapes: when you merge None with another shape, you get the other shape back, because None imposes no additional expectations.

Dependencies

~6.5–9MB
~160K SLoC