9 releases
0.3.5 | May 27, 2019 |
---|---|
0.3.4 | May 27, 2019 |
0.2.0 | May 3, 2019 |
0.1.1 | Apr 30, 2019 |
#13 in #tsv
30 downloads per month
22KB
502 lines
CSV Guillotine
Purpose
Many banks, stockbrokers and other large institutions will allow you to download your account history in a CSV file. This is good and to be applauded but they often include an extra metadata header at the top of the file explaining what it is. The CSV file may look something like the following:
Account:,****07493
£4536.24
£4536.24
Transaction type,Description,Paid out,Paid in,Balance
Bank credit YOUR EMPLOYER,Bank credit YOUR EMPLOYER,,£2016.12,£4536.24
Direct debit CREDIT PLASTIC,CREDIT PLASTIC,£402.98,,£520.12
For many users this is fine as it can still be loaded into a spreadsheet application.
For my use case, I need to download many of these files, which makes up one large data set and these extra metadata headers are quite an issue because I can no longer use xsv to parse them directly.
This library is a form of buffer which removes this metadata header. It does this by looking at the field count in a given number of rows and removes the lines before the maximum is reached.
Compiling
To compile install rust from rustup, check out this repository and run:
cargo install --path .
Command Line Usage
This can be used like the following:
cat with_metadata_headers.csv | csv-guillotine --separator=',' --consider=20 > csv_header_and_data only.csv
or
csv-guillotine -i with_metadata_headers.csv -o csv_header_and_data only.csv
see csv-guillotine --help
for full usage instructions
Errors will be printed to STDERR and their existence can be detected via the exit status.
NOTE: This software makes no attempt to actually validate that your CSV.
Library Usage
This library exposes a Blade
class which is constructed with a Read
as well as a character (expressed as a u8) and a line limit. The Blade
class can be used as a Read
to get the actual data out.
Example below:
extern crate csv_guillotine;
use std::io::{BufRead, BufReader};
use csv_guillotine::Blade;
fn main() {
let stdin = std::io::stdin();
let blade = Blade::new(stdin, 44, 20);
let mut buf_reader = BufReader::new(blade);
let mut read_size = 1;
while read_size != 0 {
let mut buffer = String::new();
match buf_reader.read_line(&mut buffer) {
Ok(r) => {
print!("{}", buffer);
read_size = r;
},
Err(e) => {
eprintln!("ERROR: {}", e);
}
}
}
}
Versions
- 0.3.5 - Allow tab seperated CSV files (AKA TSV) in binary.
- 0.3.4 - Convert Blade (lib.rs) to Blade (a generic) instead of using a Box field internally.
- 0.3.3 - More normal project layout and nicer code.
- 0.3.2 - More normal project layout and nicer code.
- 0.3.1 - Improve test coverage and fix bugs.
- 0.3.0 - Use bytes instead of String for everything so it can process non UTF8 files.
- 0.2.0 - Add a command line program
- 0.1.1 - Rename main class to Blade to keep with the guillotine theme
- 0.1.0 - Initial version
Dependencies
~1.5MB
~19K SLoC