#parquet-file #csv #parquet #convert #excel #xlsx

parquet_to_excel

a crate to convert parquet file(s) to an/a excel/csv file with constant memory in rust

3 releases (breaking)

new 0.3.0 Feb 16, 2025
0.2.0 Feb 12, 2025
0.1.3 Feb 5, 2025
0.1.2 Feb 5, 2025

#8 in #parquet-file

Download history 245/week @ 2025-02-02 122/week @ 2025-02-09

367 downloads per month

MIT license

28KB
392 lines

parquet_to_excel

A tool to convert parquet file to an/a excel/csv file in rust with constant memory, both a single parquet file and a folder of parquet files are supported. You can also use python or rust to call it. The python package name is parquet_to_excel too. you can install it by pip install parquet_to_excel. If you could not install this package correctly, you can try to install rust and maturin (pip install maturin) first. Then you can try again.

Functions

  1. parquet_file_to_csv: convert a single parquet file to a csv file
  2. parquet_files_to_csv: convert a folder of parquet files to a csv file
  3. parquet_file_to_xlsx: convert a single parquet file to an excel file
  4. parquet_files_to_xlsx: convert a folder of parquet files to an excel file

Rust Excamples

  1. parquet to csv
use std::collections::HashMap;
use parquet_to_excel::csv::{file_to_csv, folder_to_csv};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut headerlabels = HashMap::new();
    headerlabels.insert("gsmc".to_string(), "公司名称".to_string());
    headerlabels.insert("col2".to_string(), "Column 2".to_string());

    //  parquet file to csv
    let source = r"D:\Projects\RustTool\data\.duck\csv_test\source=csv_export.xlsx\data.parquet";
    let writer = r"data\test.csv";
    file_to_csv(source, writer, headerlabels.clone())?;

    //  parquet folder to csv
    let source = r"D:\Projects\RustTool\data\.duck\csv_test";
    let writer = r"data\test1.csv";
    folder_to_csv(source, writer, headerlabels)?;
    
    Ok(())
}
  1. parquet to xlxs
use std::collections::HashMap;
use parquet_to_excel::xlsx::{file_to_xlsx, folder_to_xlsx};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut headerlabels = HashMap::new();
    headerlabels.insert("gsmc".to_string(), "公司名称".to_string());
    headerlabels.insert("col2".to_string(), "Column 2".to_string());

    //  parquet file to xlsx
    let source = r"D:\Projects\RustTool\data\.duck\csv_test\source=csv_export.xlsx\data.parquet";
    let writer = r"data\test.xlsx";
    file_to_xlsx(source, writer, Some("data".into()), None, headerlabels.clone())?;

    //  parquet folder to csv
    let source = r"D:\Projects\RustTool\data\.duck\csv_test";
    let writer = r"data\test1.xlsx";
    folder_to_xlsx(source, writer, None, Some("gsmc".into()),headerlabels)?;
    
    Ok(())
}

Python Example

  1. parquet to xlsx
from parquet_to_excel import parquet_file_to_xlsx, parquet_files_to_xlsx

# the last three arguments are optional
parquet_file_to_xlsx(r"data\result\qid=160\a.parquet", r"out1.xlsx", "data", "", {"ddbm": "地点编码"})
parquet_files_to_xlsx(r"data\result\qid=160", r"out2.xlsx", "", "scfs", {"ddbm": "地点编码"})

Dependencies

~28–39MB
~740K SLoC