39 releases
0.20.0 | Dec 15, 2024 |
---|---|
0.19.0 | Sep 18, 2024 |
0.18.1 | Jun 8, 2024 |
0.17.10 | Feb 5, 2024 |
0.2.1 | Mar 5, 2021 |
#2055 in Parser implementations
190 downloads per month
19KB
279 lines
CSV to Arrow
Convert CSV files to Apache Arrow. This package is part of Arrow CLI tools.
Installation
Download prebuilt binaries
You can get the latest releases from https://github.com/domoritz/arrow-tools/releases.
With Homebrew
brew install domoritz/homebrew-tap/csv2arrow
With Cargo
cargo install csv2arrow
With Cargo B(inary)Install
To avoid re-compilation and speed up installation, you can install this tool with cargo binstall
:
cargo binstall csv2arrow
Usage
Usage: csv2arrow [OPTIONS] <CSV> [ARROW]
Arguments:
<CSV>
Input CSV file, stdin if not present
[ARROW]
Output file, stdout if not present
Options:
-s, --schema-file <SCHEMA_FILE>
File with Arrow schema in JSON format
-m, --max-read-records <MAX_READ_RECORDS>
The number of records to infer the schema from. All rows if not present. Setting max-read-records to zero will stop schema inference and all columns will be string typed
--header <HEADER>
Set whether the CSV file has headers
[default: true]
[possible values: true, false]
--delimiter <DELIMITER>
Set the CSV file's column delimiter as a byte character
--escape <ESCAPE>
Specify an escape character
--quote <QUOTE>
Specify a custom quote character
--comment <COMMENT>
Specify a comment character.
Lines starting with this character will be ignored
--null-regex <NULL_REGEX>
Provide a regex to match null values
-p, --print-schema
Print the schema to stderr
-n, --dry
Only print the schema
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
The --schema-file option uses the same file format as --dry and --print-schema.
Examples
For usage examples, see the csv2parquet
examples which shares a similar interface.
Dependencies
~14–21MB
~303K SLoC