1 unstable release

0.1.0 Feb 28, 2024

#474 in Build Utils

Apache-2.0

705KB
14K SLoC

Dashtool

Dashtool is a Lakehouse build tool that builds Iceberg tables from declarative SQL statements and generates Kubernetes workflows to keep these tables up-to-date. It handles Ingestion, Transformation and Orchestration.

Features

  • Uses declarative SQL select statements as input
  • git inspired data version control
  • Interoperability through Apache Iceberg Table format
  • Data ingestion through Singer.io
  • Data processing based on Datafusion
  • Workflow orchestration in Kubernetes through Argo Workflows

How it works

dashtool

Dashtool constructs a DAG by analyzing all .sql files in a directory structure and creates an Iceberg Materialized View for every file. Each file contains a SELECT statement for the Materialized View definition. Additionally, dashtool can use the DAG to create an Argo Workflow that refreshs the Materialized Views. During the workflow execution Argo starts Docker containers that run Datafusion to perform the refresh operation.

Examples

Usage

Check out the Documentation for a detailed description.

Build

The build command analyzes all .sql files in the subdirectories of the current directory and creates the corresponding Iceberg Materialized Views in the catalog.

dashtool build

Create Workflow

The workflow command creates a lineage DAG based on the .sql files and constructs an Argo workflow based on it. It stores the Workflow configuration file in argo/workflow.yaml.

dashtool workflow

Apply Workflow to the Kubernetes cluster

To apply the latest version of the workflow to the Kubernetes cluster run the following command:

kubectl apply -f argo/workflow.yaml

Installation

Homebrew

brew tap dashbook/dashtool
brew install dashtool

Cargo

cargo install dashtool

Dependencies

~141MB
~2.5M SLoC