1 unstable release

0.1.0 Jul 6, 2022

#907 in Data structures


Used in sidex-cli

MIT license

6KB

Sidex

Sidex is a format- and language-agnostic data structure and API definition language with a focus on simplicity, extensibility, and developer ergonomics. Sidex aims to simplify data exchange between different programming languages and platforms via potentially multiple serialization formats.

🚧 Status: Although we already use Sidex in production, it is still experimental. Use at your own risk!

✨ Features

  • Schema-first definition of data structures and RPC-like APIs.
  • Designed for format- and language-agnostic definitions.
  • Modern algebraic data types and non-null by default.
  • Extensible with user-defined opaque types.
  • Designed for interoperability, e.g., with JSON Schema.
  • VS Code extension for increased productivity.
  • Out-of-the-box support for Rust, TypeScript, and JSON.

🚀 Getting Started

Sidex is currently distributed via Cargo and crates.io. To install Sidex run:

cargo install sidex-cli

Then, to create a new Sidex definition named my_def run:

sidex new my_def

Every Sidex definition consists of a flat collection of modules located in the modules directory. Here is a simple example of a module you could place in the file person.sidex:

opaque Uuid  // This is an opaque user-defined type.

alias PersonId: Uuid  // This is a type alias.

enum Role {
    Admin,
    User,
}

struct Person {
    id: PersonId,
    name: string,
    email?: string,  // This field is optional.
    role: Role,
    children: [PersonId],  // A sequence of person ids.
}

enum GetPersonResult {
    NotFound,
    Found: Person,
}

fun get_person_by_id(id: PersonId) -> GetPersonResult

To check a definition for validity run:

sidex check

Please have a look at the recipes for further examples on how to use Sidex.

⚙️ The Sidex Language

At the core of Sidex is the Sidex language for defining data types and function types.

The core of Sidex is only concerned with such types and nothing else.

📦 Data Types

Sidex is based on five kinds of data types:

  • Opaque types are opaque to Sidex, i.e., their internal structure is a black box.

    Opaque types are defined with the opaque keyword. Opaque types are nominal, i.e., opaque types defined separately are always distinct even if they have the same name.

  • Enumeration types define unions with tagged variants of different types.

    Enumeration types are defined with the enum keyword. Enumeration types are nominal, i.e., enumeration types defined separately are always distinct even if they agree on all their variants.

  • Struct types define structures with labeled fields of different types.

    Struct types are defined with the struct keyword. Struct types are nominal, i.e., struct types defined separately are always distinct even if they agree on all their fields.

  • Sequence types define sequences of elements of the same type.

    Sequence types are created with [T] where T denotes the element type. Sequence types are structural, i.e., two sequence types with the same element type are identical.

  • Map types define mappings from keys of some type to values of some type.

    Map types are created with [K: V] where K denotes the key type and V denotes the value type. Map types are structural, i.e., two map types with the same key and value type are identical.

Sidex comes with built-in primitive types for strings, integers, and booleans. Technically, these primitive types are not any different from user-defined opaque types. These primitive types are:

  • string: For sequences of Unicode code points.
  • i8, i16, i32, i64: For signed integers of different bit width.
  • u8, u16, u32, u64: For unsigned integers of different bit width.
  • bool: For booleans.

In addition there is the void type for indicating the absence of any data.

Using opaque types, you can define your own primitives, e.g., for UUIDs:

opaque Uuid

The structure of opaques types can be specified externally, e.g., using JSON Schema.

📡 Function Types

Taking inspiration from RPC and FFI, Sidex allows defining function types with the fun keyword. Every function type consists of a sequence of named arguments with their own respective type and a return type. At its core, Sidex does not presuppose any protocol or other mechanism for invoking such functions.

🤝 Exchanging Data

Data exchange can be quite complex and involves multiple concerns which Sidex aims to separate.

📜 Language Mapping

To be useful, Sidex definitions need to be mapped to type or class definitions of some programming language, e.g., Rust or TypeScript. We refer to such a mapping as a language mapping:

┌──────────────────┐   Language Mapping    ┌─────────────────┐
│ Sidex Definition │ ────────────────────► │ Target Language │
└──────────────────┘                       └─────────────────┘

Note that a language mapping might involve certain tradeoffs to be made. For instance, in case of TypeScript, a map type can be mapped either to Object or to Map, and, in case of Rust, there are also multiple different types of maps available, e.g., HashMap or BTreeMap. Furthermore, depending on the language, certain data types may not be mappable at all due to language-specific constraints.

Hence, the goal of the Sidex project is to provide tools and infrastructure for mapping Sidex definitions to different programming languages without imposing any particular mapping. Using the sidex crate as a basis, you can define your own mappings and even generate additional boilerplate such as constructors and getters. If something cannot be sensibly mapped, a tool is free to generate an error as a last resort.

Sidex aims to provide mappings for some languages out-of-the-box with sane defaults.

Note that a language mapping is itself completely independent from how data may be serialized and how functions may be invoked. It can also be useful without ever exchanging any data.

📩 Serialization Formats

To exchange data between different languages, it needs to be serialized into some common format. To this end, a format mapping from a Sidex definition to the serialization format is necessary:

┌──────────────────┐   Format Mapping    ┌──────────────────────┐
│ Sidex Definition │ ──────────────────► │ Serialization Format │
└──────────────────┘                     └──────────────────────┘

Note that the format mapping is supposed to be language-independent. It merely describes how certain Sidex types are mapped to the serialization format and its types.

Again, Sidex does not impose any restrictions on the serialization format, however, it aims to provide some out-of-the-box mappings to common formats with sane defaults.

For user-defined opaque types, specific format mappings have to be provided.

🔗 Serialization Binding

Once we have fixed a language mapping and a format mapping, we need to bind both together using a serialization binding. A serialization binding is language-specific and format-specific. It takes serialized data as per the format mapping and transforms it into data structures as per the language mapping (known as deserialization) and vice versa (known as serialization).

🤔 Rationale

Why schema-first?

A schema-first approach has multiple advantages over definitions in a programming language: (1) It allows focusing on the important aspects of the data being exchanged. (2) It allows developing tooling independent of any programming language. (3) It enables the independent evolution and adaption of the definition language. (4) It can be used independently of a particular programming language.

Why yet another language?

Existing approaches are often specific to certain serialization formats, do not explicitly support algebraic data types, do not support arbitrary user-defined opaque types, have nullable fields by default, or/and are overly complex by supporting much more structures/types than Sidex.

⚖️ Licensing

Sidex is licensed under MIT. Unless you explicitly state otherwise, any contributions intentionally submitted for inclusion in this project shall be licensed under MIT without any additional terms or conditions.


Made with ❤️ by Silitics.

Dependencies

~1.3–2MB
~42K SLoC