1 unstable release
0.2.2 | Sep 10, 2023 |
---|
#39 in #parquet
90KB
2.5K
SLoC
xpq2
xpq2 is a fork of xpq: https://github.com/FabioBatSilva/xpq
xpq is a simple command line program for analyzing parquet files.
Fork
xpq2 was forked so I could quickly try my patches to xpq.
I have no intention of maintaining this fork. It might become stale, use xpq instead.
Requirements
- Rust nightly
See Working with nightly Rust to install nightly toolchain and set it as default.
Installation
Binaries for Linux and macOS are available from Github.
To install the binary download the latest release.
curl -s https://api.github.com/repos/FabioBatSilva/xpq/releases/latest \
| grep "browser_download_url" \
| grep apple-darwin \
| cut -d : -f 2,3 \
| tr -d \" \
| wget -qi -
Make it executable
chmod +x ./xpq-*-apple-darwin
mv ./xpq-*-apple-darwin /usr/local/bin/xpq
Alternatively, you can compile and install using Cargo :
cargo install xpq
You can also compile from source using cargo
cargo install --git https://github.com/FabioBatSilva/xpq.git --force
Available commands
- read - Read rows.
- count - Show num of rows.
- schema - Show parquet schema.
- sample - Randomly sample rows from parquet.
- frequency - Show frequency counts for each value.
Quick tour
Grab some parquet data :
wget -O users.parquet https://github.com/apache/spark/blob/master/examples/src/main/resources/users.parquet?raw=true
Check the schema :
xpq schema users.parquet
message example.avro.User {
REQUIRED BYTE_ARRAY name (UTF8);
OPTIONAL BYTE_ARRAY favorite_color (UTF8);
REQUIRED group favorite_numbers (LIST) {
REPEATED INT32 array;
}
}
Check the number of rows :
xpq count users.parquet
count
2
Read some data :
xpq read users.parquet
name favorite_color favorite_numbers
"Alyssa" null [3, 9, 15, 20]
"Ben" "red" []
Dependencies
~26–37MB
~693K SLoC