#csv #file-format #detect #sniffer #optimized #data #qsv

bin+lib qsv-sniffer

A CSV file format sniffer for Rust, optimized for qsv

21 releases

0.10.3 Mar 31, 2024
0.10.1 Dec 20, 2023
0.10.0 Aug 1, 2023
0.9.4 Jun 7, 2023
0.2.1 Mar 19, 2022

#305 in Parser implementations

Download history 344/week @ 2023-12-20 349/week @ 2023-12-27 265/week @ 2024-01-03 220/week @ 2024-01-10 565/week @ 2024-01-17 292/week @ 2024-01-24 218/week @ 2024-01-31 286/week @ 2024-02-07 222/week @ 2024-02-14 321/week @ 2024-02-21 929/week @ 2024-02-28 418/week @ 2024-03-06 377/week @ 2024-03-13 299/week @ 2024-03-20 634/week @ 2024-03-27 369/week @ 2024-04-03

1,725 downloads per month
Used in 2 crates

MIT license

56KB
1K SLoC

qsv CSV sniffer

Documentation

qsv-sniffer provides methods to infer CSV file metadata (delimiter choice, quote character, number of fields, field names, field data types, etc.). See the documentation for more details.

Its a detached fork of csv-sniffer with these additional capabilities, detecting:

  • utf-8 encoding
  • field names
  • number of rows
  • average record length
  • additional data types - Date/DateTime and NULL
  • smarter Boolean type detection - "true" and "false" are not the only Boolean values it detects. It now also detects 1/0, yes/no, y/n, true/false, t/f - case insensitive

ℹ️ NOTE: This fork is optimized to support qsv, and its development will be primarily dictated by qsv's requirements.

Setup

As a Command-line application

cargo install qsv-sniffer

This will install a binary named sniff.

As a Library

Add this to your Cargo.toml:

[dependencies]
qsv-sniffer = "0.9"

and this to your crate root:

use qsv_sniffer;

Feature flags

  • cli - to build the sniff binary
  • runtime-dispatch-simd - enables detection of SIMD capabilities at runtime, which allows using the SSE2 and AVX2 code paths (only works on Intel and AMD architectures. Ignored on other architectures).
  • generic-simd - enables architecture-agnostic SIMD capabilities, but only works with Rust nightly.

The SIMD features are mutually exclusive and increase sampling performance.

Example

This example shows how to write a simple command-line tool for discovering the metadata of a CSV file:

use qsv_sniffer;

use std::env;

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        eprintln!("Usage: {} <file>", args[0]);
        ::std::process::exit(1);
    }

    // sniff the path provided by the first argument
    match qsv_sniffer::Sniffer::new().sniff_path(&args[1]) {
        Ok(metadata) => {
            println!("{}", metadata);
        },
        Err(err) => {
            eprintln!("ERROR: {}", err);
        }
    }
}

This example is provided as the primary binary for this crate. In the source directory, this can be run as:

$ cargo run -- tests/data/library-visitors.csv

Dependencies

~7–15MB
~142K SLoC