#arrow #query #sql

datafusion-proto

Protobuf serialization of DataFusion logical plan expressions

12 major breaking releases

new 20.0.0 Mar 14, 2023
19.0.0 Feb 27, 2023
18.0.0 Feb 13, 2023
17.0.0 Jan 30, 2023
8.0.0 May 16, 2022

#79 in Encoding

Download history 680/week @ 2022-11-26 1232/week @ 2022-12-03 1114/week @ 2022-12-10 1200/week @ 2022-12-17 600/week @ 2022-12-24 1016/week @ 2022-12-31 1735/week @ 2023-01-07 2471/week @ 2023-01-14 2125/week @ 2023-01-21 2162/week @ 2023-01-28 1655/week @ 2023-02-04 1908/week @ 2023-02-11 2245/week @ 2023-02-18 3054/week @ 2023-02-25 2220/week @ 2023-03-04 1197/week @ 2023-03-11

9,092 downloads per month
Used in 8 crates (5 directly)

Apache-2.0

5MB
104K SLoC

DataFusion Proto

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

This crate is a submodule of DataFusion that provides a protocol buffer format for representing query plans and expressions.

Serializing Expressions

Based on examples/expr_serde.rs

use datafusion_common::Result;
use datafusion_expr::{col, lit, Expr};
use datafusion_proto::bytes::Serializeable;

fn main() -> Result<()> {
    // Create a new `Expr` a < 32
    let expr = col("a").lt(lit(5i32));

    // Convert it to an opaque form
    let bytes = expr.to_bytes()?;

    // Decode bytes from somewhere (over network, etc.)
    let decoded_expr = Expr::from_bytes(&bytes)?;
    assert_eq!(expr, decoded_expr);
    Ok(())
}

Serializing Logical Plans

Based on examples/logical_plan_serde.rs

use datafusion::prelude::*;
use datafusion_common::Result;
use datafusion_proto::bytes::{logical_plan_from_bytes, logical_plan_to_bytes};

#[tokio::main]
async fn main() -> Result<()> {
    let ctx = SessionContext::new();
    ctx.register_csv("t1", "testdata/test.csv", CsvReadOptions::default())
        .await
        ?;
    let plan = ctx.table("t1").await?.to_logical_plan()?;
    let bytes = logical_plan_to_bytes(&plan)?;
    let logical_round_trip = logical_plan_from_bytes(&bytes, &ctx)?;
    assert_eq!(format!("{:?}", plan), format!("{:?}", logical_round_trip));
    Ok(())
}

Serializing Physical Plans

Based on examples/physical_plan_serde.rs

use datafusion::prelude::*;
use datafusion_common::Result;
use datafusion_proto::bytes::{physical_plan_from_bytes,physical_plan_to_bytes};

#[tokio::main]
async fn main() -> Result<()> {
    let ctx = SessionContext::new();
    ctx.register_csv("t1", "testdata/test.csv", CsvReadOptions::default())
        .await
        ?;
    let logical_plan = ctx.table("t1").await?.to_logical_plan()?;
    let physical_plan = ctx.create_physical_plan(&logical_plan).await?;
    let bytes = physical_plan_to_bytes(physical_plan.clone())?;
    let physical_round_trip = physical_plan_from_bytes(&bytes, &ctx)?;
    assert_eq!(format!("{:?}", physical_plan), format!("{:?}", physical_round_trip));
    Ok(())
}

Dependencies

~43MB
~878K SLoC