17 releases
0.3.10 | Sep 11, 2024 |
---|---|
0.3.7 | Aug 9, 2024 |
0.3.6 | Jun 14, 2024 |
0.3.3 | Nov 21, 2023 |
0.1.0 | Nov 17, 2022 |
#814 in Encoding
370 downloads per month
Used in 6 crates
(4 directly)
82KB
2K
SLoC
serde_columnar
serde_columnar
is an ergonomic columnar storage encoding crate that offers forward and backward compatibility.
It allows the contents that need to be serialized and deserialized to be encoded into binary using columnar storage, all by just employing simple macro annotations.
For more detailed introduction, please refer to this Notion
link: Serde-Columnar.
🚧 This crate is in progress and not stable, should not be used in production environments
Features 🚀
serde_columnar
comes with several remarkable features:
- 🗜️ Utilizes columnar storage in conjunction with various compression strategies to significantly reduce the size of the encoded content.
- 🔄 Built-in forward and backward compatibility solutions, eliminating the need for maintaining additional version codes.
- 🌳 Supports nested columnar storage.
- 📦 Supports list and map containers
- 🔄 Supports deserialization using iterator format.
How to use
Install
cargo add serde_columnar
Or edit your Cargo.toml
and add serde_columnar
as dependency:
[dependencies]
serde_columnar = "0.3.10"
Container Attribute
vec
:map
:ser
:- Automatically derive
Serialize
trait for this struct
- Automatically derive
de
:- Automatically derive
Deserialize
trait for this struct
- Automatically derive
iterable
:- Declare this struct will be iterable
- Only available for
row
struct - Iterable for more details
Field Attribute
strategy
:- The columnar compression strategy applied to this field.
- Optional value:
Rle
/DeltaRle
/BoolRle
/DeltaOfDelta
. - Only available for
row
struct.
class
:- Declare this field is a container for rows. The field's type is usually
Vec
orHashMap
and their variants. - Optional value:
vec
ormap
. - Only available for
table
struct.
- Declare this field is a container for rows. The field's type is usually
skip
:- Same as
#[serde(skip)]
, do not serialize or deserialize this field.
- Same as
borrow
:- Same as
#[serde(borrow)]
, borrow data for this field from the deserializer by using zero-copy deserialization. - use
#[columnar(borrow="'a + 'b")]
to specify explicitly which lifetimes should be borrowed. - Only available for
table
struct for now.
- Same as
iter
:- Declare the iterable row type when deserializing using iter mode.
- Only available for field marked
class
. - Only available for
class="vec"
.
optional
&index
:- In order to achieve forward and backward compatibility, some fields that may change can be marked as
optional
. - And in order to avoid the possibility of errors in the future, such as change the order of optional fields, it is necessary to mark the
index
. - All
optional
fields must be after other fields. - The
index
is the unique identifier of the optional field, which will be encoded into the result. If the corresponding identifier cannot be found during deserialization,Default
will be used. optional
fields can be added or removed in future versions. The compatibility premise is that the field type of the same index does not change or the encoding format is compatible (such as changingu32
tou64
).
- In order to achieve forward and backward compatibility, some fields that may change can be marked as
Examples
use serde_columnar::{columnar, from_bytes, to_vec};
#[columnar(vec, ser, de)] // this struct can be a row of vec-like container
struct RowStruct {
name: String,
#[columnar(strategy = "DeltaRle")] // this field will be encoded by `DeltaRle`
id: u64,
#[columnar(strategy = "Rle")] // this field will be encoded by `Rle`
gender: String,
#[columnar(strategy = "BoolRle")] // this field will be encoded by `BoolRle`
married: bool
#[columnar(optional, index = 0)] // This field is optional, which means that this field can be added in this version or deleted in a future version
future: String
#[columnar(strategy = "DeltaOfDelta")] // this field will be encoded by `DeltaOfDelta`
time: i64
}
#[columnar(ser, de)] // derive `Serialize` and `Deserialize`
struct TableStruct<'a> {
#[columnar(class = "vec")] // this field is a vec-like table container
pub data: Vec<RowStruct>,
#[columnar(borrow)] // the same as `#[serde(borrow)]`
pub text: Cow<'a, str>
#[columnar(skip)] // the same as `#[serde(skip)]`
pub ignore: u8
#[columnar(optional, index = 0)] // table container also supports optional field
pub other_data: u64
}
let table = TableStruct::new(...);
let bytes = serde_columnar::to_vec(&table).unwrap();
let table_from_bytes = serde_columnar::from_bytes::<TableStruct>(&bytes).unwrap();
You can find more examples of serde_columnar
in examples
and tests
.
Iterable
When we use columnar for compression encoding, there is a premise that the field is iterable. So we can completely borrow the encoded bytes to obtain all the data in the form of iterator during deserialization without directly allocating the memory of all the data. This implementation can also be achieved completely through macros.
To use iter mode when deserializing, you only need to do 3 things:
- mark all row struct with
iterable
- mark the field of row container with
iter="..."
- use
serde_columnar::iter_from_bytes
to deserialize
#[columnar(vec, ser, de, iterable)]
struct Row{
#[columnar(strategy="Rle")]
rle: String
#[columnar(strategy="DeltaRle")]
delta_rle: u64
other: u8
}
#[columnar(ser, de)]
struct Table{
#[columnar(class="vec", iter="Row")]
vec: Vec<Row>,
other: u8
}
let table = Table::new(...);
let bytes = serde_columnar::to_vec(&table).unwrap();
let table_iter = serde_columnar::iter_from_bytes::<Table>(&bytes).unwrap();
Acknowledgements
- serde: Serialization framework for Rust.
- postcard: Postcard is a #![no_std] focused serializer and deserializer for Serde. We use it as serializer and deserializer in order to provide VLE and ZigZag encoding.
- Automerge: Automerge is an excellent crdt framework, we reused the code related to RLE Encoding in it.
Dependencies
~1.3–2.1MB
~45K SLoC