4 releases (2 breaking)
0.3.0 | Jul 30, 2022 |
---|---|
0.2.0 | Jul 28, 2022 |
0.1.1 | Jul 20, 2022 |
0.1.0 | Jul 19, 2022 |
#1407 in Encoding
1,993 downloads per month
Used in 2 crates
62KB
1.5K
SLoC
Read Apache ORC from Rust
Read Apache ORC in Rust.
This repository is similar to parquet2 and Avro-schema, providing a toolkit to:
- Read ORC files (proto structures)
- Read stripes (the conversion from proto metadata to memory regions)
- Decode stripes (the math of decode stripes into e.g. booleans, runs of RLE, etc.)
It currently reads the following (logical) types:
- booleans
- strings
- integers
- floats
What is not yet implemented:
- Snappy, LZO decompression
- RLE v2
Patched Base
decoding - RLE v1 decoding
- Utility functions to decode non-native logical types:
- decimal
- timestamp
- struct
- List
- Union
Run tests
python3 -m venv venv
venv/bin/pip install -U pip
venv/bin/pip install -U pyorc
venv/bin/python write.py
cargo test
Dependencies
~2.5MB
~58K SLoC