1 unstable release
0.1.0 | Mar 12, 2022 |
---|
#7 in #bigtable
40KB
765 lines
Datafusion-Bigtable
Bigtable data source for Apache Arrow Datafusion
Run SQL on Bigtable
This crate implements Bigtable data source and Executor for Datafusion. It is built on top of gRPC client tonic.
Quick Start
let bigtable_datasource = BigtableDataSource::new(
"emulator".to_owned(), // project
"dev".to_owned(), // instance
"weather_balloons".to_owned(), // table
"measurements".to_owned(), // column family
vec!["_row_key".to_owned()], // table_partition_cols
"#".to_owned(), // table_partition_separator
vec![Field::new("pressure", DataType::Utf8, false)], // qualifiers
true, // only_read_latest
).await.unwrap();
let mut ctx = ExecutionContext::new();
ctx.register_table("weather_balloons", Arc::new(bigtable_datasource)).unwrap();
ctx.sql("SELECT \"_row_key\", pressure, \"_timestamp\" FROM weather_balloons where \"_row_key\" = 'us-west2#3698#2021-03-05-1200'").await?.collect().await?;
Roadmap
Bigtable
- ✅ UTF8 string
- ✅ 64-bit big-endian signed integer
SQL
- ✅ select by
"_row_key" =
- ✅ select by
"_row_key" IN
- ✅ select by
"_row_key" BETWEEN
- ✅ select by composite row keys
=
- ✅ select by composite row keys
IN
- ✅ select by composite row keys
BETWEEN
(only supported by last table_partition_cols)
General
- ✅ Projection pushdown
- Predicate push down
- Multi Thread or Partition aware execution
- Production ready Bigtable SDK in Rust
Note: datafusion-bigtable provides the physical Executor for Datafusion. Any aggregation, group by, join are implemented and handled by Datafusion.
Dependencies
~48–64MB
~1.5M SLoC