1 unstable release
Uses new Rust 2024
new 0.1.0 | Apr 22, 2025 |
---|
#27 in Science
14KB
nc-to-pq
A simple command-line tool written in Rust to convert NetCDF4 files (.nc) to Apache Parquet format (.parquet).
This tool leverages the peroxide
crate for reading NetCDF files and writing Parquet files.
Important Constraint: Apache Parquet is a columnar storage format. This tool requires that the input NetCDF4 file is structured such that each variable can be read as a 1-dimensional vector, corresponding directly to a column in the output Parquet file. It will likely fail or produce incorrect results if your NetCDF variables have more than one dimension (e.g., grids, matrices).
Features
- Reads variables from a NetCDF4 file.
- Writes the data to an Apache Parquet file.
- Supports specifying output file path.
- Supports basic compression options (
snappy
oruncompressed
).
Installation
Ensure you have Rust and Cargo installed. You can then install nc-to-pq
directly from crates.io (once published):
cargo install nc-to-pq
Or, clone the repository and build manually:
git clone https://github.com/Axect/nc-to-pq
cd nc-to-pq
cargo build --release
# The executable will be in ./target/release/nc-to-pq
Usage
nc-to-pq <INPUT_NETCDF_FILE> [OPTIONS]
Arguments:
<INPUT_NETCDF_FILE>
: Path to the input NetCDF4 file (.nc).
Options:
-o, --output <FILE>
: Specifies the path for the output Parquet file. If omitted, the output file will have the same name as the input file but with a.parquet
extension, placed in the same directory.-c, --compression <COMPRESSION>
: Specifies the compression algorithm to use.default
: No compression (Uncompressed). This is the default if the option is omitted.snappy
: Use Snappy compression.
-h, --help
: Print help information.-V, --version
: Print version information.
Examples:
-
Basic Conversion: Convert
data.nc
todata.parquet
(uncompressed).nc-to-pq data.nc
-
Specify Output File: Convert
input.nc
tooutput_data.parquet
.nc-to-pq input.nc -o output_data.parquet
-
Use Snappy Compression: Convert
measurements.nc
tomeasurements.parquet
using Snappy compression.nc-to-pq measurements.nc -c snappy
-
Specify Output and Compression: Convert
input.nc
tocompressed_output.parquet
using Snappy.nc-to-pq input.nc -o compressed_output.parquet -c snappy
License
This project is licensed under the [Your Chosen License, e.g., MIT License] - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests to the GitHub repository.
Dependencies
~16MB
~304K SLoC