8 unstable releases (3 breaking)
0.4.0 | May 21, 2023 |
---|---|
0.3.0 | May 20, 2023 |
0.2.1 | May 19, 2023 |
0.1.3 | May 17, 2023 |
#2830 in Parser implementations
74KB
1.5K
SLoC
sanitise
A library for headache-free data clean-up and validation.
sanitise
is a CSV processing and validation library that generates code at compile time based on a YAML configuration file. The generated code is robust and will not panic.
no_std
environments are supported, but the alloc
crate is required.
Quick Start
Add sanitise
to your dependencies in your Cargo.toml
:
[dependencies]
sanitise = "0.1"
Import the macro:
use sanitise::sanitise_string;
And call:
// main.rs
use std::{fs, iter::zip};
use sanitise::sanitise_string;
fn main() {
let csv = fs::read_to_string("data.csv").unwrap();
let ((time_millis, pulse, movement), (time_secs,)) = sanitise_string!(include_str!("sanitise_config.yaml"), &csv).unwrap();
println!("time_millis,time_secs,pulse,movement");
for (((time_millis, pulse), movement), time_secs) in zip(zip(zip(time_millis, pulse), movement), time_secs) {
println!("{time_millis},{time_secs},{pulse},{movement}")
}
}
# sanitise_config.yaml
processes:
- name: validate
columns:
- title: time
type: integer
- title: pulse
type: integer
max: 100
min: 40
on-invalid: average
valid-streak: 3
- title: movement
type: integer
valid-values: [0, 1]
output-type: boolean
output: "value == 1"
- name: process
columns:
- title: time
type: integer
output: "value / 1000"
- title: pulse
type: integer
ignore: true
- title: movement
type: integer
ignore: true
# data.csv
time,pulse,movement
0,67,0
15,45,1
126,132,1
The first argument to sanitise_string!
must be either a string literal or a macro call that expands to a string literal. The second argument must be an expression that resolves to a &str
in CSV format. In the above example, sanitise_config.yaml
must be next to main.rs
, and data.csv
must be in the working directory at runtime.
The other macro, sanitise!
, is used when your data has already been parsed into the correct shape. See the documentation for more details.
Configuration
For details on the configuration file, see the specification.
Optional features
benchmark
: Print the time taken to complete various stages of the process. Disablesno_std
support. You probably don't want this.
Efficiency
The macro creates linear finite automata to process each column. If on-invalid
is set to average
for a given column, that column's automaton will use a state machine to keep track of valid and invalid values. If a column is ignored, no automaton will be generated for it. All data is stored in native Rust types.
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Dependencies
~0.4–0.8MB
~21K SLoC