2 unstable releases

0.2.0 Feb 12, 2023
0.1.0 Feb 11, 2023

#2213 in Encoding


Used in pretty-csv

MIT license

28KB
654 lines

CsvStream

Deserialize CSVs, one record at a time.

Usage

use csvstream::{ByteRecord, read_csv};

let mut csv = &b"1,2\n3,4"[..];
let records = read_csv(&mut csv).collect::<Vec<ByteRecord>>();
assert_eq!(records.len(), 2);
assert_eq!(&records[0][0], b"1");
assert_eq!(&records[0][1], b"2");
assert_eq!(&records[1][0], b"3");
assert_eq!(&records[1][1], b"4");

See the docs for more info.


lib.rs:

Deserialize CSVs, one record at a time.

Csv input is assumed to follow RFC4180, which is the closest thing to a standard we have.

RFC4180 TLDR; A csv is a sequence of records seperated by new lines. Records contain fields. Fields should be seperated by commas. If a field needs to have a comma, double quote or newline character within it, double quote that field. Within quoted fields, use 2 x double quotes to represent a literal double quote:

// basic normal fields
field 1,field 2,field 3

// Sparse fields
field 1,,field 3,""

// quoted fields. The values of the fields are: `field "1"`, `field "2"`, `field "3"`
"field ""1""","field ""2""", "field ""3"""

// quote fields with newlines in them
"this field
spans two lines", "this one does not"

The RFC says nothing about the following common cases:


// empty records
field 1, field 2

field 5, field 6

// whitespace before/after a field
field 1,    field2
field 3  ,field 4

These are considered valid by the parser and you'll get what you would expect.

Csv headers are not treated differently. They are parsed like any other record

Example

You can deserialize anything that implements the BufRead trait

use csvstream::{ByteRecord, read_csv};

let mut csv = &b"1,2\n3,4"[..];
let records = read_csv(&mut csv).collect::<Vec<ByteRecord>>();
assert_eq!(records.len(), 2);
assert_eq!(&records[0][0], b"1");
assert_eq!(&records[0][1], b"2");
assert_eq!(&records[1][0], b"3");
assert_eq!(&records[1][1], b"4");

This crate also offers a helper method to write csv. It's not really serialization because we take a string as input to begin with, but it does help with escaping fields properly where necessary.

You can write to anything that implements the Write trait

use csvstream::{write_csv_record};
use std::str::from_utf8;

let mut output: Vec<u8> = vec![];
write_csv_record(["hello", "world"], &mut output).unwrap();
assert_eq!("hello,world\n", from_utf8(&output).unwrap());

No runtime deps