1 unstable release
0.1.0 | May 5, 2021 |
---|
#2838 in Parser implementations
200KB
5.5K
SLoC
Verilization
Verilization is a serilization description language designed for defining binary file formats. Unlike other serialization tools such as Protocol Buffers, serialized Verilization data is not forward or backward compatible. Instead, conversions from older format versions are made easy, allowing for more compact data and more control over the underlying shape of the data.
Goals
Verilization has the following primary goals.
- Give maximum control of the file format to the user
- Define the format in a language-independent manner
- Support easy conversions from older versions of the format
Other less high-level goals.
- Support bigint types
- Embedable in other languages without using native binaries
Types
The following types are supported.
Type | Encoding |
---|---|
struct types |
The encoding of each field in order |
enum types |
A tag (encoded in the same format as nat ) followed by the encoding of the field represented by the tag |
extern types |
Defined by code written in the target language |
Structs
A struct
type is defined with multiple versions. Each version defines a list of fields.
struct Rectangle {
version 1 {
width: u32;
height: u32;
}
}
Enums
An enum
type is defined with multiple versions. Each version defines a list of fields used as cases. An enum value consists of exactly one of these fields.
struct StringOrInt {
version 1 {
str: string;
num: int;
}
}
Externs
An extern
type is defined in user code. The type definition, conversions, and codecs must be implemented in the target language.
An extern
type may declare what literals may be used for the type.
extern MyString {
literal {
string;
}
}
The following literal specifications are supported.
Name | Example | Syntax | Notes |
---|---|---|---|
Integer | integer [0, 256) | 'integer' open_bracket integer_literal? ',' integer_literal? close_bracket where `open_bracket : '[' |
'('and close_bracket : ']' |
String | string |
'string' |
The contents of the string cannot be restricted. |
Sequence | sequence T |
'sequence' type_expr |
Defines a sequence of the specified type. |
Case | case Positive() |
'case' identifier '(' [ type_expr { ',' type_expr } ] ')' |
Defines a case. Multiple case literals may be specified if the names are distinct. |
Record | record { a: A; b: B; } |
'record' '{' { identifier ':' type_expr ';' } '}' |
Defines a record. |
Runtime Library Types
There are a number of extern
types provided by the runtime library.
Type | Literals | Encoding |
---|---|---|
{i,u}{8,16,32,64} |
Integers within the range of the type | Fixed-width sequence of bytes in little endian order |
int |
Integers | A variable-length format |
nat |
Non-negative integers | Similar format to int , but without the sign bit |
string |
Strings | A length nat followed by a sequence of UTF-8 bytes with the specified length |
list T |
sequence of T |
A length nat followed by a sequence of T |
option T |
Two cases some(x) and none() |
A byte b . If b is non-zero, then it is followed by a T |
The encodings for int
and nat
define a sequence of bits in little-endian order.
The highest bit in each byte is set if there are more bytes in the number.
This encoding is a sequence of bytes [B0, ..., Bn] where Bi,7 = 1 when i < n and Bn,7 = 0. This sequence of bytes is equivalent to a sequence of bits [B0,0, ... B0,6, ..., Bn-1,0, ..., Bn-1,6] = [b0, ..., bm-1] where m = 6n. Essentially, the sequence of bits removes the flag bits that are used to determine when the sequence has reached the end and orders the remaining bits in each byte from least to most significant. The sequence of bits is mapped as follows:
- For the
int
type, if bm-1 = 0, then k = b0 * 20 + ... + bm - 2 * 2m-2 - For the
int
type, if bm-1 = 1, then k = -(b0 * 20 + ... + bm - 2 * 2m-2) - 1 - For the
nat
type, k = b0 * 20 + ... + bm - 1 * 2m-1
Versioning
In the following example, a user has a username and birth date.
struct Person {
version 1 {
name: Name;
dob: Date;
}
}
struct Name {
version 1 {
firstName: string;
middleName: option string;
lastName: string;
}
}
However, not everyone has 2 or 3 names. In order to accomodate this, we can create a new version that allows for an arbitrary number of names.
struct Name {
version 1 {
...
}
version 2 {
names: list string;
}
}
This change to Name
means that in version 2 of the format, the name
field of Person
will now use version 2 of Name
.
However, since there are no direct changes to Person
, version 2 is automaticly created.
In the generated code, the user is expected to provide code that can upgrade Name
from version 1 to version 2.
However, there is no need to provide such code for upgrading Person
.
Person
can be upgraded automaticially using the upgrade code for its fields.
Final
Versioned types can be declared as final
to indicate that no new versions of the type will be added.
This restricts the type to the last explicitly declared version, preventing newer versions from being automaticially generated.
Final types may only include fields of final or non-versioned types.
final struct FormatVersion {
version 1 {
major: nat;
}
}
Generics
Generic types allow a type to be parameterized.
final struct Pair(A, B) {
version 1 {
left: A;
right: B;
}
}
Constants
Constants allow for values to be defined that are shared between any generated languages.
Literal | Example | Usage |
---|---|---|
Integer | 88 |
extern types with integer literal |
String | "Hello World" |
extern types with string literal |
Sequence | [ a, b, c ] |
extern types with sequence literal |
Record | { x = 1; y = 2; } |
struct types and extern types with record literal |
Case | Name(a) |
enum types and extern types with case Name literal |
Command Line
Verilization has a command line interface. The following options are supported.
Language Generators
The following languages are supported.
Compiler Bindings
The verilization compiler is written in Rust. It can be compiled into WebAssembly for use in other languages. This has the advantage that a tool can be distributed (for example, as an NPM package, a standalone JAR, etc) without requiring any native binaries. These bindings expose both an interface that can be used directly from the runtime, as well as a command line interface that depends only on the associated runtime.
Currently, there are bindings for the following runtimes.
- Node
Dependencies
~2.5MB
~56K SLoC