#prost #arrow #apache-arrow #protobuf #grpc #rpc

prost-arrow

Derives apache arrow array builders for protobuf messages generated by prost

3 releases

0.0.3 Mar 29, 2024
0.0.2 Mar 29, 2024
0.0.1 Mar 27, 2024

#41 in #prost

Apache-2.0

14KB
264 lines

ci Documentation Crate

PROST! Apache Arrow Support

prost-arrow provides a derive trait that can be used to generate arrow array builders for any protobuf types generated using prost.

Usage

This crate provides the ToArrow trait and a proc-macro to derive it. It must be derived on all messages, so we add it as a type_attribute with the catch-all path ".". The generated impls depend on both the prost-arrow crate as well as a few arrow crates.

You will need to add the following dependencies to your Cargo.toml:

arrow-array
arrow-buffer
arrow-schema
prost-arrow

In your build script:

// prost
prost_build::Config::new()
    .type_attribute(".", "#[derive(::prost_arrow::ToArrow)]")
    .compile_protos(&["proto/routeguide/route_guide.proto"], &["proto/"])
    .unwrap();

// tonic
tonic_build::configure()
    .type_attribute(".", "#[derive(::prost_arrow::ToArrow)]")
    .compile(&["proto/routeguide/route_guide.proto"], &["proto"])
    .unwrap();

Finally, to access the array builder for a generated prost type, we use prost_arrow::new_builder<T> for some prost-generated type T that has the ToArrow type derived. The builder returned will implement the base arrow_builder::Builder trait, but will also have append_value and append_option methods that accepts our prost type T.

// required trait imports
use arrow_array::builder::ArrayBuilder;
use prost_arrow::{ArrowBuilder, ToArrow};

// Rectangle is a prost-generated struct that has ToArrow derived.
let mut builder = prost_arrow::new_builder::<Rectangle>();

builder.append_value(Rectangle {
    lo: Some(pt_1),
    hi: None,
    messages: vec!["one".to_string(), "two".to_string()],
    extra_points: vec![
        Point {
            latitude: 1,
            longitude: 2,
        },
        Point {
            latitude: 3,
            longitude: 4,
        },
    ],
    binary: vec![0, 1, 2, 3],
    repeated_binary: vec![vec![10, 100]],
});

The builder can be used just like any other arrow builder implementation type, so the finish or finish_cloned methods can be used to finalize the arrow array (in our case, a struct array).

// finish the array builder to get an ArrayRef
let arr = builder.finish();

// downcast the array into StructArray
let struct_arr = arr.as_any().downcast_ref::<StructArray>().unwrap();

// convert to RecordBatch if desired
let record_batch: RecordBatch = struct_arr.into();

Completeness

feature supported
primitive types
repeated fields
optional fields (via optional)
optional fields (via wrapper types) 🚧
well-known types (e.g. timestamp) 🚧
oneof fields 🚧
map fields 🚧
nested messages
recursive/cyclic messages

Dependencies

~5–11MB
~123K SLoC