30 releases
0.9.0 | Oct 17, 2024 |
---|---|
0.8.13 | Jul 19, 2024 |
0.8.9 | Feb 10, 2024 |
0.8.8 | Oct 2, 2023 |
0.7.0 | Nov 25, 2022 |
#957 in Parser implementations
682 downloads per month
Used in 3 crates
(2 directly)
2.5MB
4.5K
SLoC
Library to python version.
lib.rs
:
Converts CSV files into XLSX/SQLITE/POSTGRESQL/PARQUET fast.
Aims
- Thorough type guessing of CSV columns, so there is no need to configure types of each field. Scans whole file first to make sure all types in a column are consistent. Can detect over 30 date/time formats as well as JSON data.
- Quick conversions/type guessing (uses rust underneath). Uses fast methods specific for each output format:
copy
for postgres- Prepared statements for sqlite using c API.
- Arrow reader for parquet
- Write only mode for libxlsxwriter
- Tries to limit errors when inserting data into database by resorting to "text" if type guessing can't determine a more specific type.
- When inserting into existing databases automatically migrate schema of target to allow for new data (
evolve
option). - Memory efficient. All csvs and outputs are streamed so all conversions should take up very little memory.
- Gather stats and information about CSV files into datapacakge.json file which can use it for customizing conversion.
Drawbacks
- CSV files currently need header rows.
- Whole file needs to be on disk as whole CSV is analyzed therefore files are read twice.
Dependencies
~10–30MB
~435K SLoC