12 stable releases
2.2.1 | Mar 30, 2022 |
---|---|
2.2.0 | Mar 29, 2022 |
2.1.0 | Mar 27, 2021 |
1.4.0 | Dec 31, 2020 |
0.1.0 | Jun 30, 2020 |
#1025 in Text processing
155KB
4K
SLoC
CSVSC
A library for building transformation chains on csv files.
Docs
please visit https://docs.rs/csvsc
lib.rs
:
csvsc
is a framework for building csv file processors.
Imagine you have N csv files with the same structure and you want to use them to make other M csv files whose information depends in some way on the original files. This is what csvcv was built for. With this tool you can build a processing chain (row stream) that will take each of the input files and generate new output files with the modifications.
Quickstart
Start a new binary project with cargo:
$ cargo new --bin miprocesadordecsv
Add csvsc
and encoding
as a dependency in Cargo.toml
.
[dependencies]
csvsc = "2.2"
Now start building your processing chain. Specify the inputs (one or more csv files), the transformations, and the output.
use csvsc::prelude::*;
let mut chain = InputStreamBuilder::from_paths(&[
// Put here the path to your source files, from 1 to a million
"test/assets/chicken_north.csv",
"test/assets/chicken_south.csv",
]).unwrap().build().unwrap()
// Here is where you do the magic: add columns, remove ones, filter
// the rows, group and aggregate, even probably transpose the data
// to fit your needs.
// Specify some (zero, one or many) output targets so that results of
// your computations get stored somewhere.
.flush(Target::path("data/output.csv")).unwrap()
.into_iter();
// And finally consume the stream, reporting any errors to stderr.
while let Some(item) = chain.next() {
if let Err(e) = item {
eprintln!("{}", e);
}
}
Example
Grab your input files, in this case I'll use this two:
chicken_north.csv
month,eggs per week
1,3
1,NaN
1,6
2,
2,4
2,8
3,5
3,1
3,8
chicken_south.csv
month,eggs per week
1,2
1,NaN
1,
2,7
2,8
2,23
3,3
3,2
3,12
Now build your processing chain.
// main.rs
use csvsc::prelude::*;
use encoding::all::UTF_8;
let mut chain = InputStreamBuilder::from_paths(vec![
"test/assets/chicken_north.csv",
"test/assets/chicken_south.csv",
]).unwrap()
// optionally specify the encoding
.with_encoding(UTF_8)
// optionally add a column with the path of the source file as specified
// in the builder
.with_source_col("_source")
// build the row stream
.build().unwrap()
// Filter some columns with invalid values
.filter_col("eggs per week", |value| {
value.len() > 0 && value != "NaN"
}).unwrap()
// add a column with a value obtained from the filename ¡wow!
.add(
Column::with_name("region")
.from_column("_source")
.with_regex("_([a-z]+).csv").unwrap()
.definition("$1")
).unwrap()
// group by two columns, compute some aggregates
.group(["region", "month"], |row_stream| {
row_stream.reduce(vec![
Reducer::with_name("region").of_column("region").last("").unwrap(),
Reducer::with_name("month").of_column("month").last("").unwrap(),
Reducer::with_name("avg").of_column("eggs per week").average().unwrap(),
Reducer::with_name("sum").of_column("eggs per week").sum(0.0).unwrap(),
]).unwrap()
})
// Write a report to a single file that will contain all the data
.flush(
Target::path("data/report.csv")
).unwrap()
// This column will allow us to output to multiple files, in this case
// a report by month
.add(
Column::with_name("monthly report")
.from_all_previous()
.definition("data/monthly/{month}.csv")
).unwrap()
.del(vec!["month"])
// Write every row to a file specified by its `monthly report` column added
// previously
.flush(
Target::from_column("monthly report")
).unwrap()
// Pack the processing chain into an interator that can be consumed.
.into_iter();
// Consuming the iterator actually triggers all the transformations.
while let Some(item) = chain.next() {
item.unwrap();
}
This is what comes as output:
data/monthly/1.csv
region,avg,sum
south,2,2
north,4.5,9
data/monthly/2.csv
region,avg,sum
north,6,12
south,12.666666666666666,38
data/monthly/3.csv
region,avg,sum
north,4.666666666666667,14
south,5.666666666666667,17
data/report.csv
region,month,avg,sum
north,2,6,12
south,1,2,2
south,2,12.666666666666666,38
north,3,4.666666666666667,14
south,3,5.666666666666667,17
north,1,4.5,9
Dig deeper
Check InputStreamBuilder
to see more options for
starting a processing chain and reading your input.
Go to the RowStream
documentation to see all the transformations available
as well as options to flush the data to files or standard I/O.
Dependencies
~7MB
~105K SLoC