1 unstable release

new 0.2.1 May 18, 2025

#20 in #data-lake

Apache-2.0

89KB
2K SLoC

Azof DataFusion Integration

This crate provides integration between Azof lakehouse format and Apache DataFusion.

Features

  • Register Azof tables in a DataFusion context
  • Query tables using SQL
  • Support for time-travel queries

Usage

use azof_datafusion::AzofTableProvider;
use datafusion::prelude::*;
use object_store::local::LocalFileSystem;
use object_store::path::Path;
use std::sync::Arc;

async fn query_example() -> Result<(), Box<dyn std::error::Error>> {
    // Set up paths and storage
    let store_path = Path::from("/path/to/lakehouse");
    let store = Arc::new(LocalFileSystem::new());
    
    // Create a DataFusion session
    let ctx = SessionContext::new();
    
    // Create a table provider for the current version
    let provider = AzofTableProvider::current(
        store_path.clone(), 
        store.clone(), 
        "my_table".to_string()
    )?;
    
    // Register with DataFusion
    ctx.register_table("my_table", Arc::new(provider))?;
    
    // Execute a SQL query
    let df = ctx.sql("SELECT * FROM my_table WHERE key = '1'").await?;
    df.show().await?;
    
    Ok(())
}

Time-Travel Queries

To query a table as of a specific time:

use chrono::{TimeZone, Utc};

let event_time = Utc.with_ymd_and_hms(2024, 2, 15, 0, 0, 0).unwrap();
let provider = AzofTableProvider::as_of(
    store_path, 
    store, 
    "my_table".to_string(),
    event_time
)?;

Running the Examples

cargo run --example query_example -p azof-datafusion

Dependencies

~71MB
~1.5M SLoC