#orc #analytics #read #apache #rle #decode #metadata

orc-format

Unofficial implementation of Apache ORC spec in safe Rust

4 releases (2 breaking)

0.3.0 Jul 30, 2022
0.2.0 Jul 28, 2022
0.1.1 Jul 20, 2022
0.1.0 Jul 19, 2022

#1407 in Encoding

Download history 654/week @ 2024-07-19 460/week @ 2024-07-26 472/week @ 2024-08-02 777/week @ 2024-08-09 602/week @ 2024-08-16 470/week @ 2024-08-23 278/week @ 2024-08-30 521/week @ 2024-09-06 446/week @ 2024-09-13 904/week @ 2024-09-20 500/week @ 2024-09-27 487/week @ 2024-10-04 637/week @ 2024-10-11 464/week @ 2024-10-18 341/week @ 2024-10-25 461/week @ 2024-11-01

1,993 downloads per month
Used in 2 crates

MIT/Apache

62KB
1.5K SLoC

Read Apache ORC from Rust

test codecov

Read Apache ORC in Rust.

This repository is similar to parquet2 and Avro-schema, providing a toolkit to:

  • Read ORC files (proto structures)
  • Read stripes (the conversion from proto metadata to memory regions)
  • Decode stripes (the math of decode stripes into e.g. booleans, runs of RLE, etc.)

It currently reads the following (logical) types:

  • booleans
  • strings
  • integers
  • floats

What is not yet implemented:

  • Snappy, LZO decompression
  • RLE v2 Patched Base decoding
  • RLE v1 decoding
  • Utility functions to decode non-native logical types:
    • decimal
    • timestamp
    • struct
    • List
    • Union

Run tests

python3 -m venv venv
venv/bin/pip install -U pip
venv/bin/pip install -U pyorc
venv/bin/python write.py
cargo test

Dependencies

~2.5MB
~58K SLoC