41 releases (9 breaking)
0.9.60 | Mar 26, 2021 |
---|---|
0.9.6 | Mar 10, 2021 |
0.7.3 | Dec 30, 2020 |
0.4.2 | Nov 20, 2020 |
0.0.0-beta.3 | Mar 21, 2020 |
#321 in Encoding
101 downloads per month
795KB
13K
SLoC
NoProto: Flexible, Fast & Compact Serialization with RPC
Github | Crates.io | Documentation
Features
Lightweight
- Zero dependencies
no_std
support, WASM ready- Most compact non compiling storage format
Stable
- Safely accept untrusted buffers
- Passes Miri compiler safety checks
- Panic and unwrap free
Easy
- Extensive Documentation & Testing
- Full interop with JSON, Import and Export JSON values
- Thoroughly documented & simple data storage format
Fast
- Zero copy deserialization
- Most updates are append only
- Deserialization is incrimental
Powerful
- Native byte-wise sorting
- Supports recursive data types
- Supports most common native data types
- Supports collections (list, map, struct & tuple)
- Supports arbitrary nesting of collection types
- Schemas support default values and non destructive updates
- Transport agnostic RPC Framework.
Why ANOTHER Serialization Format?
- NoProto combines the performance of compiled formats with the flexibilty of dynamic formats:
Compiled formats like Flatbuffers, CapN Proto and bincode have amazing performance and extremely compact buffers, but you MUST compile the data types into your application. This means if the schema of the data changes the application must be recompiled to accomodate the new schema.
Dynamic formats like JSON, MessagePack and BSON give flexibilty to store any data with any schema at runtime but the buffers are fat and performance is somewhere between horrible and hopefully acceptable.
NoProto takes the performance advantages of compiled formats and implements them in a flexible format.
- NoProto is a key-value database focused format:
Byte Wise Sorting Ever try to store a signed integer as a sortable key in a database? NoProto can do that. Almost every data type is stored in the buffer as byte-wise sortable, meaning buffers can be compared at the byte level for sorting without deserializing.
Primary Key Management Compound sortable keys are extremely easy to generate, maintain and update with NoProto. You don't need a custom sort function in your key-value store, you just need this library.
UUID & ULID Support NoProto is one of the few formats that come with first class suport for these popular primary key data types. It can easily encode, decode and generate these data types.
Fastest Updates NoProto is the only format that supports all mutations without deserializng. It can do the common database read -> update -> write operation between 50x - 300x faster than other dynamic formats. Benchamrks
Comparison With Other Formats
Compared to Apache Avro
- Far more space efficient- Significantly faster serialization & deserialization
- All values are optional (no void or null type)
- Supports more native types (like unsigned ints)
- Updates without deserializng/serializing
- Works with `no_std`.
- Safely handle untrusted data.
Compared to Protocol Buffers
- Comparable serialization & deserialization performance- Updating buffers is an order of magnitude faster
- Schemas are dynamic at runtime, no compilation step
- All values are optional
- Supports more types and better nested type support
- Byte-wise sorting is first class operation
- Updates without deserializng/serializing
- Safely handle untrusted data.
- All values are optional and can be inserted in any order.
Compared to JSON / BSON
- Far more space efficient- Significantly faster serialization & deserialization
- Deserializtion is zero copy
- Has schemas / type safe
- Supports byte-wise sorting
- Supports raw bytes & other native types
- Updates without deserializng/serializing
- Works with `no_std`.
- Safely handle untrusted data.
Compared to Flatbuffers / Bincode
- Data types can change or be created at runtime- Updating buffers is an order of magnitude faster
- Supports byte-wise sorting
- Updates without deserializng/serializing
- Works with `no_std`.
- Safely handle untrusted data.
- All values are optional and can be inserted in any order.
Format | Zero-Copy | Size Limit | Mutable | Schemas | Byte-wise Sorting |
---|---|---|---|---|---|
Runtime Libs | |||||
NoProto | ✓ | ~4GB | ✓ | ✓ | ✓ |
Apache Avro | ✗ | 2^63 Bytes | ✗ | ✓ | ✓ |
JSON | ✗ | Unlimited | ✓ | ✗ | ✗ |
BSON | ✗ | ~16MB | ✓ | ✗ | ✗ |
MessagePack | ✗ | Unlimited | ✓ | ✗ | ✗ |
Compiled Libs | |||||
FlatBuffers | ✓ | ~2GB | ✗ | ✓ | ✗ |
Bincode | ✓ | ? | ✓ | ✓ | ✗ |
Protocol Buffers | ✗ | ~2GB | ✗ | ✓ | ✗ |
Cap'N Proto | ✓ | 2^64 Bytes | ✗ | ✓ | ✗ |
Veriform | ✗ | ? | ✗ | ✗ | ✗ |
Quick Example
use no_proto::error::NP_Error;
use no_proto::NP_Factory;
// An ES6 like IDL is used to describe schema for the factory
// Each factory represents a single schema
// One factory can be used to serialize/deserialize any number of buffers
let user_factory = NP_Factory::new(r#"
struct({ fields: {
name: string(),
age: u16({ default: 0 }),
tags: list({ of: string() })
}})
"#)?;
// create a new empty buffer
let mut user_buffer = user_factory.new_buffer(None); // optional capacity
// set the "name" field
user_buffer.set(&["name"], "Billy Joel")?;
// read the "name" field
let name = user_buffer.get::<&str>(&["name"])?;
assert_eq!(name, Some("Billy Joel"));
// set a nested value, the first tag in the tag list
user_buffer.set(&["tags", "0"], "first tag")?;
// read the first tag from the tag list
let tag = user_buffer.get::<&str>(&["tags", "0"])?;
assert_eq!(tag, Some("first tag"));
// close buffer and get internal bytes
let user_bytes: Vec<u8> = user_buffer.finish().bytes();
// open the buffer again
let user_buffer = user_factory.open_buffer(user_bytes);
// read the "name" field again
let name = user_buffer.get::<&str>(&["name"])?;
assert_eq!(name, Some("Billy Joel"));
// get the age field
let age = user_buffer.get::<u16>(&["age"])?;
// returns default value from schema
assert_eq!(age, Some(0u16));
// close again
let user_bytes: Vec<u8> = user_buffer.finish().bytes();
// we can now save user_bytes to disk,
// send it over the network, or whatever else is needed with the data
# Ok::<(), NP_Error>(())
Guided Learning / Next Steps:
Schemas
- Learn how to build & work with schemas.Factories
- Parsing schemas into something you can work with.Buffers
- How to create, update & compact buffers/data.RPC Framework
- How to use the RPC Framework APIs.Data & Schema Format
- Learn how data is saved into the buffer and schemas.
Benchmarks
While it's difficult to properly benchmark libraries like these in a fair way, I've made an attempt in the graph below. These benchmarks are available in the bench
folder and you can easily run them yourself with cargo run --release
.
The format and data used in the benchmarks were taken from the flatbuffers
benchmarks github repo. You should always benchmark/test your own use case for each library before making any choices on what to use.
Legend: Ops / Millisecond, higher is better
Format / Lib | Encode | Decode All | Decode 1 | Update 1 | Size (bytes) | Size (Zlib) |
---|---|---|---|---|---|---|
Runtime Libs | ||||||
NoProto | ||||||
no_proto | 1393 | 1883 | 55556 | 9524 | 308 | 198 |
Apache Avro | ||||||
avro-rs | 156 | 57 | 56 | 40 | 702 | 337 |
FlexBuffers | ||||||
flexbuffers | 444 | 962 | 24390 | 294 | 490 | 309 |
JSON | ||||||
json | 609 | 481 | 607 | 439 | 439 | 184 |
serde_json | 938 | 646 | 644 | 403 | 446 | 198 |
BSON | ||||||
bson | 129 | 116 | 123 | 90 | 414 | 216 |
rawbson | 130 | 1117 | 17857 | 89 | 414 | 216 |
MessagePack | ||||||
rmp | 661 | 623 | 832 | 202 | 311 | 193 |
messagepack-rs | 152 | 266 | 284 | 138 | 296 | 187 |
Compiled Libs | ||||||
Flatbuffers | ||||||
flatbuffers | 3165 | 16393 | 250000 | 2532 | 264 | 181 |
Bincode | ||||||
bincode | 6757 | 9259 | 10000 | 4115 | 163 | 129 |
Postcard | ||||||
postcard | 3067 | 7519 | 7937 | 2469 | 128 | 119 |
Protocol Buffers | ||||||
protobuf | 953 | 1305 | 1312 | 529 | 154 | 141 |
prost | 1464 | 2020 | 2232 | 1040 | 154 | 142 |
Abomonation | ||||||
abomonation | 2342 | 125000 | 500000 | 2183 | 261 | 160 |
Rkyv | ||||||
rkyv | 1605 | 37037 | 200000 | 1531 | 180 | 154 |
- Encode: Transfer a collection of fields of test data into a serialized
Vec<u8>
. - Decode All: Deserialize the test object from the
Vec<u8>
into all fields. - Decode 1: Deserialize the test object from the
Vec<u8>
into one field. - Update 1: Deserialize, update a single field, then serialize back into
Vec<u8>
.
Runtime VS Compiled Libs: Some formats require data types to be compiled into the application, which increases performance but means data types cannot change at runtime. If data types need to mutate during runtime or can't be known before the application is compiled (like with databases), you must use a format that doesn't compile data types into the application, like JSON or NoProto.
Complete benchmark source code is available here. Suggestions for improving the quality of these benchmarks is appreciated.
NoProto Strengths
If your use case fits any of the points below, NoProto might be a good choice for your application.
-
Flexible At Runtime
If you need to work with data types that will change or be created at runtime, you normally have to pick something like JSON since highly optimized formats like Flatbuffers and Bincode depend on compiling the data types into your application (making everything fixed at runtime). When it comes to formats that can change/implement data types at runtime, NoProto is fastest format we're aware of (if you know if one that might be faster, let us know!). -
Safely Accept Untrusted Data
The worse case failure mode for NoProto buffers is junk data. While other formats can cause denial of service attacks or allow unsafe memory access, there is no such failure case with NoProto. There is no way to construct a NoProto buffer that would cause any detrement in performance to the host application or lead to unsafe memory access. Also, there is no panic causing code in the library, meaning it will never crash your application. -
Extremely Fast Updates
If you have a workflow in your application that is read -> modify -> write with buffers, NoProto will usually outperform every other format, including Bincode and Flatbuffers. This is because NoProto never actually deserializes, it doesn't need to. This includes complicated mutations like pushing a value onto a nested list or replacing entire structs. -
All Fields Optional, Insert/Update In Any Order
Many formats require that all values be present to close the buffer, further they may require data to be inserted in a specific order to accomodate the encoding/decoding scheme. With NoProto, all fields are optional and any update/insert can happen in any order. -
Incremental Deserializing
You only pay for the fields you read, no more. There is no deserializing step in NoProto, opening a buffer performs no operations. Once you start asking for fields, the library will navigate the buffer using the format rules to get just what you asked for and nothing else. If you have a workflow in your application where you read a buffer and only grab a few fields inside it, NoProto will outperform most other libraries. -
Bytewise Sorting
Almost all of NoProto's data types are designed to serialize into bytewise sortable values, including signed integers. When used with Tuples, making database keys with compound sorting is extremly easy. When you combine that with first class support forUUID
s andULID
s NoProto makes an excellent tool for parsing and creating primary keys for databases like RocksDB, LevelDB and TiKV. -
no_std
Support
If you need a serialization format with low memory usage that works inno_std
environments, NoProto is one of the few good choices. -
Stable
NoProto will never cause a panic in your application. It has zero panics or unwraps, meaning there is no code path that could lead to a panic. Fallback behavior is to provide a sane default path or bubble an error up to the caller. -
CPU Independent
All numbers and pointers in NoProto buffers are always stored in big endian, so you can safely create buffers on any CPU architecture and know that they will work with any other CPU architecture.
When to use Flatbuffers / Bincode / CapN Proto
If you can safely compile all your data types into your application, all the buffers/data is trusted, and you don't intend to mutate buffers after they're created, Bincode/Flatbuffers/CapNProto is a better choice for you.
When to use JSON / BSON / MessagePack
If your data changes so often that schemas don't really make sense or the format you use must be self describing, JSON/BSON/MessagePack is a better choice. Although I'd argue that if you can make schemas work you should. Once you can use a format with schemas you save a ton of space in the resulting buffers and performance far better.
Limitations
- Structs and Tuples cannot have more than 255 items.
- Lists and Maps cannot have more than 2^16 (~64k) items.
- You cannot nest more than 255 levels deep.
- Struct field names cannot be longer than 255 UTF8 bytes.
- Enum/Option types are limited to 255 options and each option cannot be more than 255 UTF8 Bytes.
- Map keys cannot be larger than 255 UTF8 bytes.
- Buffers cannot be larger than 2^32 bytes or ~4GB.
Unsafe
This library makes use of unsafe
to get better performance. Generally speaking, it's not possible to have a high performance serialization library without unsafe
. It is only used where performance improvements are significant and additional checks are performed so that the worst case for any unsafe
block is it leads to junk data in a buffer.
MIT License
Copyright (c) 2021 Scott Lott
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.