3 releases
| 0.0.1 | Sep 18, 2025 |
|---|---|
| 0.0.0-alpha1 | Aug 22, 2025 |
| 0.0.0-alpha0 | Aug 21, 2025 |
#4 in #arrow-schema
156 downloads per month
195KB
4K
SLoC
typed-arrow
typed-arrow provides a strongly typed, fully compile-time way to declare Arrow schemas in Rust. It maps Rust types directly to arrow-rs typed builders/arrays and arrow_schema::DataType — without any runtime DataType switching — enabling zero runtime cost, monomorphized column construction and ergonomic ORM-like APIs.
Why compile-time Arrow?
- Performance: monomorphized builders/arrays with zero dynamic dispatch; avoids runtime
DataTypematching. - Safety: column types, names, and nullability live in the type system; mismatches fail at compile time.
- Interop: uses
arrow-array/arrow-schematypes directly; no bespoke runtime layer to learn.
Quick Start
use typed_arrow::{prelude::*, schema::SchemaMeta};
use typed_arrow::{Dictionary, TimestampTz, Millisecond, Utc, List};
#[derive(typed_arrow::Record)]
struct Address { city: String, zip: Option<i32> }
#[derive(typed_arrow::Record)]
struct Person {
id: i64,
address: Option<Address>,
tags: Option<List<Option<i32>>>, // List column with nullable items
code: Option<Dictionary<i32, String>>, // Dictionary<i32, Utf8>
joined: TimestampTz<Millisecond, Utc>, // Timestamp(ms) with timezone (UTC)
}
fn main() {
// Build from owned rows
let rows = vec![
Person {
id: 1,
address: Some(Address { city: "NYC".into(), zip: None }),
tags: Some(List::new(vec![Some(1), None, Some(3)])),
code: Some(Dictionary::new("gold".into())),
joined: TimestampTz::<Millisecond, Utc>::new(1_700_000_000_000),
},
Person {
id: 2,
address: None,
tags: None,
code: None,
joined: TimestampTz::<Millisecond, Utc>::new(1_700_000_100_000),
},
];
let mut b = <Person as BuildRows>::new_builders(rows.len());
b.append_rows(rows);
let arrays = b.finish();
// Compile-time schema + RecordBatch
let batch = arrays.into_record_batch();
assert_eq!(batch.schema().fields().len(), <Person as Record>::LEN);
println!("rows={}, field0={}", batch.num_rows(), batch.schema().field(0).name());
}
Add to your Cargo.toml (derives enabled by default):
[dependencies]
typed-arrow = { version = "0.x" }
When working in this repository/workspace:
[dependencies]
typed-arrow = { path = "." }
Examples
Run the included examples to see end-to-end usage:
01_primitives— deriveRecord, inspectDataType, build primitives02_lists—List<T>andList<Option<T>>03_dictionary—Dictionary<K, String>04_timestamps—Timestamp<U>units04b_timestamps_tz—TimestampTz<U, Z>withUtcand custom markers05_structs— nested structs →StructArray06_rows_flat— row-based building for flat records07_rows_nested— row-based building with nested struct fields08_record_batch— compile-time schema +RecordBatch09_duration_interval— Duration and Interval types10_union— Dense Union as a Record column (with attributes)11_map— Map (incl.Option<V>values) + as a Record column12_ext_hooks— Extend#[derive(Record)]with visitor injection and macro callbacks
Run:
cargo run --example 08_record_batch
Core Concepts
Record: implemented by the derive macro for structs with named fields.ColAt<I>: per-column associated itemsRust,ColumnBuilder,ColumnArray,NULLABLE,NAME, anddata_type().ArrowBinding: compile-time mapping from a Rust value type to its Arrow builder, array, andDataType.BuildRows: derive generates<Type>Buildersand<Type>Arrayswithappend_row(s)andfinish.SchemaMeta: derive providesfields()andschema(); arrays structs provideinto_record_batch().AppendStructandStructMeta: enable nested struct fields andStructArraybuilding.
Metadata (Compile-time)
- Schema-level: annotate with
#[schema_metadata(k = "owner", v = "data")]. - Field-level: annotate with
#[metadata(k = "pii", v = "email")]. - You can repeat attributes to add multiple pairs; later duplicates win.
Nested Type Wrappers
- Struct fields: struct-typed fields map to Arrow
Structcolumns by default. Make the parent field nullable withOption<Nested>; child nullability is independent. - Lists:
List<T>(items non-null) andList<Option<T>>(items nullable). UseOption<List<_>>for list-level nulls. - LargeList:
LargeList<T>andLargeList<Option<T>>for 64-bit offsets; wrap withOption<_>for column nulls. - FixedSizeList:
FixedSizeList<T, N>(items non-null) andFixedSizeListNullable<T, N>(items nullable). Wrap withOption<_>for list-level nulls. - Map:
Map<K, V, const SORTED: bool = false>where keys are non-null; useMap<K, Option<V>>to allow nullable values. Column nullability viaOption<Map<...>>.SORTEDsetskeys_sortedin the ArrowDataType. - OrderedMap:
OrderedMap<K, V>usesBTreeMap<K, V>and declareskeys_sorted = true. - Dictionary:
Dictionary<K, V>with integral keysK ∈ { i8, i16, i32, i64, u8, u16, u32, u64 }and values:String/LargeUtf8(Utf8/LargeUtf8)Vec<u8>/LargeBinary(Binary/LargeBinary)[u8; N](FixedSizeBinary)- primitives
i*,u*,f32,f64Column nullability viaOption<Dictionary<..>>.
- Timestamps:
Timestamp<U>(unit-only) andTimestampTz<U, Z>(unit + timezone). Units:Second,Millisecond,Microsecond,Nanosecond. UseUtcor define your ownZ: TimeZoneSpec. - Decimals:
Decimal128<P, S>andDecimal256<P, S>(precisionP, scaleSas const generics). - Unions:
#[derive(Union)]for enums with#[union(mode = "dense"|"sparse")], per-variant#[union(tag = N)],#[union(field = "name")], and optional null carrier#[union(null)]or container-levelnull_variant = "Var".
Arrow DataType Coverage
Supported (arrow-rs v56):
- Primitives: Int8/16/32/64, UInt8/16/32/64, Float16/32/64, Boolean
- Strings/Binary: Utf8, LargeUtf8, Binary, LargeBinary, FixedSizeBinary (via
[u8; N]) - Temporal: Timestamp (with/without TZ; s/ms/us/ns), Date32/64, Time32(s/ms), Time64(us/ns), Duration(s/ms/us/ns), Interval(YearMonth/DayTime/MonthDayNano)
- Decimal: Decimal128, Decimal256 (const generic precision/scale)
- Nested:
- List (including nullable items), LargeList, FixedSizeList (nullable/non-null items)
- Struct,
- Map (Vec<(K,V)>; use
Option<V>for nullable values), OrderedMap (BTreeMap<K,V>) withkeys_sorted = true - Union: Dense and Sparse (via
#[derive(Union)]on enums) - Dictionary: keys = all integral types; values = Utf8 (String), LargeUtf8, Binary (Vec), LargeBinary, FixedSizeBinary (
[u8; N]), primitives (i*, u*, f32, f64)
Missing:
- BinaryView, Utf8View
- Utf8View
- ListView, LargeListView
- RunEndEncoded
Extensibility
- Derive extension hooks allow user-level customization without changing the core derive:
- Inject compile-time visitors:
#[record(visit(MyVisitor))] - Call your macros per field/record:
#[record(field_macro = my_ext::per_field, record_macro = my_ext::per_record)] - Tag fields/records with free-form markers:
#[record(ext(key))]
- Inject compile-time visitors:
- See
docs/extensibility.mdand the runnable exampleexamples/12_ext_hooks.rs.
Dependencies
~7MB
~131K SLoC