21 releases

0.10.3 Mar 31, 2024
0.10.1 Dec 20, 2023
0.10.0 Aug 1, 2023
0.9.4 Jun 7, 2023
0.2.1 Mar 19, 2022

#462 in Parser implementations

Download history 509/week @ 2024-09-18 419/week @ 2024-09-25 570/week @ 2024-10-02 479/week @ 2024-10-09 434/week @ 2024-10-16 517/week @ 2024-10-23 511/week @ 2024-10-30 414/week @ 2024-11-06 491/week @ 2024-11-13 373/week @ 2024-11-20 530/week @ 2024-11-27 525/week @ 2024-12-04 445/week @ 2024-12-11 321/week @ 2024-12-18 250/week @ 2024-12-25 311/week @ 2025-01-01

1,404 downloads per month
Used in 2 crates

MIT license

56KB
1K SLoC

qsv CSV sniffer

Documentation

qsv-sniffer provides methods to infer CSV file metadata (delimiter choice, quote character, number of fields, field names, field data types, etc.). See the documentation for more details.

Its a detached fork of csv-sniffer with these additional capabilities, detecting:

  • utf-8 encoding
  • field names
  • number of rows
  • average record length
  • additional data types - Date/DateTime and NULL
  • smarter Boolean type detection - "true" and "false" are not the only Boolean values it detects. It now also detects 1/0, yes/no, y/n, true/false, t/f - case insensitive

ℹ️ NOTE: This fork is optimized to support qsv, and its development will be primarily dictated by qsv's requirements.

Setup

As a Command-line application

cargo install qsv-sniffer

This will install a binary named sniff.

As a Library

Add this to your Cargo.toml:

[dependencies]
qsv-sniffer = "0.9"

and this to your crate root:

use qsv_sniffer;

Feature flags

  • cli - to build the sniff binary
  • runtime-dispatch-simd - enables detection of SIMD capabilities at runtime, which allows using the SSE2 and AVX2 code paths (only works on Intel and AMD architectures. Ignored on other architectures).
  • generic-simd - enables architecture-agnostic SIMD capabilities, but only works with Rust nightly.

The SIMD features are mutually exclusive and increase sampling performance.

Example

This example shows how to write a simple command-line tool for discovering the metadata of a CSV file:

use qsv_sniffer;

use std::env;

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        eprintln!("Usage: {} <file>", args[0]);
        ::std::process::exit(1);
    }

    // sniff the path provided by the first argument
    match qsv_sniffer::Sniffer::new().sniff_path(&args[1]) {
        Ok(metadata) => {
            println!("{}", metadata);
        },
        Err(err) => {
            eprintln!("ERROR: {}", err);
        }
    }
}

This example is provided as the primary binary for this crate. In the source directory, this can be run as:

$ cargo run -- tests/data/library-visitors.csv

Dependencies

~9–15MB
~164K SLoC