#iceberg #catalog #rest #native #axum #server #change

iceberg-catalog

Server implementation of the Iceberg Catalog REST API based on axum and iceberg-rust

1 unstable release

0.0.0 Jun 13, 2024

#15 in #iceberg

Apache-2.0

5KB

Iceberg Catalog - The TIP of the Iceberg

License

This is TIP: A Rust-native implementation of the Apache Iceberg REST Catalog specification based on apache/iceberg-rust.

Scope and Features

The Iceberg Protocol (TIP) based on REST has become the standard for catalogs in open Lakehouses. It natively enables multi-table commits, server-side deconflicting and much more. It is figuratively the (TIP) of the Iceberg.

We have started this implementation because we were missing customizability, support for on-premise deployments and other features that are important for us in existing Iceberg Catalogs. Please find following some of our focuses with this implementation:

  • Customizable: Our implementation is meant to be extended. We expose the Database implementation, Secrets, Authorization, EventPublishing and ContractValidation as interfaces (Traits). This allows you to tap into any Access management system of your company or stream change events to any system you know the API of - simply by implementing a handful methods. Please find more details in the Customization Guide.
  • Change Events: Built-in support to emit change events (CloudEvents), which enables you to react to any change that happen to your tables.
  • Change Approval: Changes can also be prohibited by external systems. This can be used to prohibit changes to tables that would invalidate Data Contracts, Quality SLOs etc. Simply integrate with your own change approval via our ContractVerification trait.
  • Multi-Tenant capable: A single deployment of our catalog can serve multiple projects - all with a single entrypoint. All Iceberg and Warehouse configurations are completely separated between Warehouses.
  • Written in Rust: Single 18Mb all-in-one binary - no JVM or Python env required.
  • Storage Access Management: Built-in S3-Signing that enables support for self-hosted as well as AWS S3 WITHOUT sharing S3 credentials with clients. We are also working on vended-credentials!
  • Well-Tested: Integration-tested with spark and pyiceberg (support for S3 with this catalog from pyiceberg 0.7.0)
  • High Available & Horizontally Scalable: There is no local state - the catalog can be scaled horizontally and updated without downtimes.
  • Fine Grained Access (FGA) (Coming soon): Simple Role-Based access control is not enough for many rapidly evolving Data & Analytics initiatives. We are leveraging OpenFGA based on googles Zanzibar-Paper to implement authorization. If your company already has a different system in place, you can integrate with it by implementing a handful of methods in the AuthZHandler trait.

Please find following an overview of currently supported features. Please also check the Issues if you are missing something.

WIP

We are working on releasing the crate here. In the meantime, feel free to pull the code directly from GitHub.

No runtime deps