4 releases (breaking)
| 0.4.1 | Jan 19, 2026 |
|---|---|
| 0.3.0 | Dec 21, 2025 |
| 0.2.0 | Nov 26, 2025 |
| 0.1.0 | Nov 17, 2025 |
#2155 in Database interfaces
655KB
15K
SLoC
icepick
A CLI tool and wasm-compatible library for managing Apache Iceberg tables in AWS S3 Tables and Cloudflare R2 Data Catalog.
Table of Contents
What it does
icepick provides a simple command-line interface and wasm-friendly library for working with Apache Iceberg tables:
- List and inspect namespaces and tables
- Scan tables with partition pruning and column statistics
- Commit Parquet files to tables (with auto-detection of Hive-style partitions)
- Compact small files using bin-pack compaction
- Clean up snapshots based on retention policies
Why?
The official iceberg-rust library doesn't yet support WASM compilation, and most Iceberg tools are built for JVM environments. icepick fills the gap for:
- Serverless environments like Cloudflare Workers
- CLI-first workflows without spinning up Spark or Flink
- Lightweight table maintenance (compaction, snapshot cleanup)
- Quick data exploration without complex query engines
Quickstart
Install
cargo install icepick --features cli
Configure
Set your catalog credentials:
# For Cloudflare R2
export ICEPICK_CATALOG_URL="https://catalog.cloudflarestorage.com/<account-id>/<bucket>"
export ICEPICK_TOKEN="<cloudflare-api-token>"
# For AWS S3 Tables
export ICEPICK_CATALOG_ARN="arn:aws:s3tables:us-west-2:123456789012:bucket/my-bucket"
# Uses AWS credential chain (env vars, ~/.aws/credentials, IAM role)
Verify Connection
# List namespaces
icepick namespace list
# List tables in a namespace
icepick table list --namespace my_namespace
# Get table info
icepick table info my_namespace.my_table
CLI Reference
Namespaces
# List all namespaces
icepick namespace list
# Create a namespace
icepick namespace create my_namespace
# Delete a namespace
icepick namespace delete my_namespace
Tables
# List tables in a namespace
icepick table list --namespace my_namespace
# Get detailed table info (schema, partitioning, snapshots)
icepick table info my_namespace.my_table
# Scan table data (shows pruning stats with filters)
icepick table scan my_namespace.my_table
# Scan with filter
icepick table scan my_namespace.my_table --filter "date >= '2024-01-01'"
# Limit output rows
icepick table scan my_namespace.my_table --limit 100
Commit Files
Commit existing Parquet files to an Iceberg table:
# Preview what would be committed (dry run)
icepick commit /data/**/*.parquet --namespace prod --table events --dry-run
# Commit files to existing table
icepick commit /data/**/*.parquet --namespace prod --table events
# Create new table with partition spec
icepick commit /data/**/*.parquet --namespace prod --table events \
--create --partition year:int,month:int
# For non-Hive paths, specify partition values explicitly
icepick commit /flat/*.parquet --namespace prod --table events \
--partition-values year=2024,month=01
# Use specific file as schema exemplar
icepick commit /data/**/*.parquet --namespace prod --table events \
--exemplar /data/sample.parquet --create
The commit command:
- Uses first file's schema (or
--exemplar) as the reference - Validates all files match the schema
- Extracts partition values from Hive-style paths automatically
- Supports
--partition-valuesfor flat directory structures - Shows detailed plan with
--dry-runbefore committing
Compaction
Merge small files into larger ones for better query performance:
# Preview compaction plan (dry run)
icepick compact my_namespace.my_table --dry-run
# Execute compaction with default settings
icepick compact my_namespace.my_table
# Custom target file size (256 MB)
icepick compact my_namespace.my_table --target-size 268435456
# Only compact files smaller than 128 MB
icepick compact my_namespace.my_table --max-input-size 134217728
Snapshots
Manage table snapshots and clean up old versions:
# List all snapshots with age and status
icepick snapshot list my_namespace.my_table
# Preview cleanup (dry run)
icepick snapshot cleanup my_namespace.my_table --dry-run
# Execute cleanup with retention policy
icepick snapshot cleanup my_namespace.my_table \
--older-than-days 7 \
--retain-last 10
Snapshot cleanup respects:
- Current snapshot - Never expired (it's the current table state)
- Referenced snapshots - Never expired if referenced by branches or tags
- Retention count - Keeps the N most recent regardless of age
- Age threshold - Only expires snapshots older than the threshold
Cloudflare R2
Authentication
- Log into the Cloudflare dashboard
- Navigate to My Profile → API Tokens
- Create a token with R2 read/write permissions
- Set environment variables:
export ICEPICK_CATALOG_URL="https://catalog.cloudflarestorage.com/<account-id>/<bucket>"
export ICEPICK_TOKEN="<your-api-token>"
WASM Compatibility
The R2 catalog is fully WASM-compatible, making it suitable for:
- Cloudflare Workers
- Browser applications (if your catalog REST API supports CORS)
AWS S3 Tables
Authentication
Uses the AWS default credential provider chain:
- Environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) - AWS credentials file (
~/.aws/credentials) - IAM instance profile (EC2)
- ECS task role
export ICEPICK_CATALOG_ARN="arn:aws:s3tables:us-west-2:123456789012:bucket/my-bucket"
Important: Ensure your credentials have S3 Tables permissions.
Platform Support
S3 Tables requires the AWS SDK and is only available on native platforms (Linux, macOS, Windows). It does not compile to WASM.
Library Usage
icepick can also be used as a Rust library for programmatic access to Iceberg tables. See DEVELOPER.md for:
- Rust API examples
- Direct Parquet writes
- Registering existing files
- WASM considerations
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
Acknowledgments
Built on the official iceberg-rust library from the Apache Iceberg project.
Dependencies
~41–63MB
~1M SLoC