#csv

csvsc

Build processing chains for CSV files

12 stable releases

2.2.1 Mar 30, 2022
2.2.0 Mar 29, 2022
2.1.0 Mar 27, 2021
1.4.0 Dec 31, 2020
0.1.0 Jun 30, 2020

#525 in Text processing

31 downloads per month

MIT license

150KB
4K SLoC

CSVSC

A library for building transformation chains on csv files.

Docs

please visit https://docs.rs/csvsc


lib.rs:

csvsc is a framework for building csv file processors.

Imagine you have N csv files with the same structure and you want to use them to make other M csv files whose information depends in some way on the original files. This is what csvcv was built for. With this tool you can build a processing chain (row stream) that will take each of the input files and generate new output files with the modifications.

Quickstart

Start a new binary project with cargo:

$ cargo new --bin miprocesadordecsv

Add csvsc and encoding as a dependency in Cargo.toml.

[dependencies]
csvsc = "2.2"

Now start building your processing chain. Specify the inputs (one or more csv files), the transformations, and the output.

use csvsc::prelude::*;

let mut chain = InputStreamBuilder::from_paths(&[
// Put here the path to your source files, from 1 to a million
"test/assets/chicken_north.csv",
"test/assets/chicken_south.csv",
]).unwrap().build().unwrap()

// Here is where you do the magic: add columns, remove ones, filter
// the rows, group and aggregate, even probably transpose the data
// to fit your needs.

// Specify some (zero, one or many) output targets so that results of
// your computations get stored somewhere.
.flush(Target::path("data/output.csv")).unwrap()

.into_iter();

// And finally consume the stream, reporting any errors to stderr.
while let Some(item) = chain.next() {
if let Err(e) = item {
eprintln!("{}", e);
}
}

Example

Grab your input files, in this case I'll use this two:

chicken_north.csv

month,eggs per week
1,3
1,NaN
1,6
2,
2,4
2,8
3,5
3,1
3,8

chicken_south.csv

month,eggs per week
1,2
1,NaN
1,
2,7
2,8
2,23
3,3
3,2
3,12

Now build your processing chain.

// main.rs
use csvsc::prelude::*;

use encoding::all::UTF_8;

let mut chain = InputStreamBuilder::from_paths(vec![
"test/assets/chicken_north.csv",
"test/assets/chicken_south.csv",
]).unwrap()

// optionally specify the encoding
.with_encoding(UTF_8)

// optionally add a column with the path of the source file as specified
// in the builder
.with_source_col("_source")

// build the row stream
.build().unwrap()

// Filter some columns with invalid values
.filter_col("eggs per week", |value| {
value.len() > 0 && value != "NaN"
}).unwrap()

// add a column with a value obtained from the filename ¡wow!
.add(
Column::with_name("region")
.from_column("_source")
.with_regex("_([a-z]+).csv").unwrap()
.definition("$1")
).unwrap()

// group by two columns, compute some aggregates
.group(["region", "month"], |row_stream| {
row_stream.reduce(vec![
Reducer::with_name("region").of_column("region").last("").unwrap(),
Reducer::with_name("month").of_column("month").last("").unwrap(),
Reducer::with_name("avg").of_column("eggs per week").average().unwrap(),
Reducer::with_name("sum").of_column("eggs per week").sum(0.0).unwrap(),
]).unwrap()
})

// Write a report to a single file that will contain all the data
.flush(
Target::path("data/report.csv")
).unwrap()

// This column will allow us to output to multiple files, in this case
// a report by month
.add(
Column::with_name("monthly report")
.from_all_previous()
.definition("data/monthly/{month}.csv")
).unwrap()

.del(vec!["month"])

// Write every row to a file specified by its `monthly report` column added
// previously
.flush(
Target::from_column("monthly report")
).unwrap()

// Pack the processing chain into an interator that can be consumed.
.into_iter();

// Consuming the iterator actually triggers all the transformations.
while let Some(item) = chain.next() {
item.unwrap();
}

This is what comes as output:

data/monthly/1.csv

region,avg,sum
south,2,2
north,4.5,9

data/monthly/2.csv

region,avg,sum
north,6,12
south,12.666666666666666,38

data/monthly/3.csv

region,avg,sum
north,4.666666666666667,14
south,5.666666666666667,17

data/report.csv

region,month,avg,sum
north,2,6,12
south,1,2,2
south,2,12.666666666666666,38
north,3,4.666666666666667,14
south,3,5.666666666666667,17
north,1,4.5,9

Dig deeper

Check InputStreamBuilder to see more options for starting a processing chain and reading your input.

Go to the [RowStream] documentation to see all the transformations available as well as options to flush the data to files or standard I/O.

Dependencies

~5MB
~86K SLoC