46 releases (breaking)

Uses new Rust 2024

new 0.32.0	Apr 29, 2025
0.30.0	Apr 17, 2025
0.25.2	Mar 3, 2025
0.21.1	Dec 16, 2024
0.1.0	May 20, 2024

#271 in Database implementations

1,657 downloads per month
Used in 33 crates (31 directly)

Apache-2.0

27KB
467 lines

🌪️ Vortex

📚 Documentation | 📊 Performance Benchmarks

Overview

Vortex is a next-generation columnar file format and toolkit designed for high-performance data analytics. It provides:

⚡️ Blazing Fast Performance
- 100-200x faster random access reads than Apache Parquet
- 2-10x faster scans with similar compression ratios and write throughput
- Efficient support for wide tables with zero-copy/zero-parse metadata
🔧 Extensible Architecture
- Modeled after Apache DataFusion's extensible approach
- Pluggable encoding system
- Zero-copy compatibility with Apache Arrow

🚧 Development Status: This project is under active development. APIs and file formats may change, and some features are still being implemented.

Key Features

Core Capabilities

✨ Logical Types - Clean separation between logical schema and physical layout
🔄 Zero-Copy Arrow Integration - Seamless conversion to/from Apache Arrow arrays
🧩 Extensible Encodings - Pluggable physical layouts with built-in optimizations
📦 Cascading Compression - Support for nested encoding schemes
🚀 High-Performance Computing - Optimized compute kernels for encoded data
📊 Rich Statistics - Lazy-loaded summary statistics for optimization

Technical Architecture

Logical vs Physical Design

Vortex strictly separates logical and physical concerns:

Logical Layer: Defines data types and schema
Physical Layer: Handles encoding and storage implementation
Built-in Encodings: Compatible with Apache Arrow's memory format
Extension Encodings: Optimized compression schemes (RLE, dictionary, etc.)

Quick Start

Installation

Rust Crate

All features are exported through the main vortex crate.

cargo add vortex

Python Package

uv add vortex-array

Command Line UI (vx)

For browsing the structure of Vortex files, you can use the vx command-line tool.

# Install latest release
cargo install vortex-tui --locked

# Or build from source
cargo install --path vortex-tui --locked

# Usage
vx browse <file>

Development Setup

Prerequisites (macOS)

# Optional but recommended dependencies
brew install flatbuffers protobuf  # For .fbs and .proto files
brew install duckdb               # For benchmarks

# Install Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# or
brew install rustup

# Initialize submodules
git submodule update --init --recursive

# Setup dependencies with uv
uv sync --all-packages

Performance Optimization

For optimal performance, use MiMalloc:

#[global_allocator]
static GLOBAL_ALLOC: MiMalloc = MiMalloc;

Project Information

License

Licensed under the Apache License, Version 2.0

Governance

Vortex is committed to remaining open-source, following governance models inspired by the Substrait project and Apache Software Foundation.

Contributing

See CONTRIBUTING.md for guidelines.

Acknowledgments 🏆

This project builds upon groundbreaking work from the academic and open-source communities:

Key Research Papers

BtrBlocks - Efficient columnar compression
FastLanes - High-performance integer compression
FSST - Fast random access string compression
ALP - Adaptive lossless floating-point compression
Procella - YouTube's unified data system
Cloud Object Storage Analytics - High-performance analytics
ClickHouse - Fast analytics for everyone

Open Source Inspiration

Thanks to all contributors who have shared their knowledge and code with the community! 🚀

Dependencies

~6–23MB
~348K SLoC