34 releases
0.19.0 | Sep 18, 2024 |
---|---|
0.18.1 | Jun 8, 2024 |
0.18.0 | Apr 1, 2024 |
0.17.10 | Feb 5, 2024 |
0.2.2 | Mar 5, 2021 |
#1489 in Parser implementations
18KB
252 lines
JSON to Arrow
Convert JSON files to Apache Arrow. This package is part of Arrow CLI tools.
Installation
Download prebuilt binaries
You can get the latest releases from https://github.com/domoritz/arrow-tools/releases.
With Homebrew
brew install domoritz/homebrew-tap/json2arrow
With Cargo
cargo install json2arrow
With Cargo B(inary)Install
To avoid re-compilation and speed up installation, you can install this tool with cargo binstall
:
cargo binstall json2arrow
Usage
Usage: json2arrow [OPTIONS] <JSON> [ARROW]
Arguments:
<JSON> Input JSON file, stdin if not present
[ARROW] Output file, stdout if not present
Options:
-s, --schema-file <SCHEMA_FILE>
File with Arrow schema in JSON format
-m, --max-read-records <MAX_READ_RECORDS>
The number of records to infer the schema from. All rows if not present. Setting max-read-records to zero will stop schema inference and all columns will be string typed
-p, --print-schema
Print the schema to stderr
-n, --dry
Only print the schema
-h, --help
Print help
-V, --version
Print version
The --schema-file option uses the same file format as --dry and --print-schema.
Examples
For usage examples, see the csv2parquet
examples which shares a similar interface.
Limitations
Since we use the Arrow JSON loader, we are limited to what it supports. Right now, it supports JSON line-delimited files.
{ "a": 42, "b": true }
{ "a": 12, "b": false }
{ "a": 7, "b": true }
Dependencies
~14–21MB
~305K SLoC