8 releases
0.2.4 | Feb 28, 2024 |
---|---|
0.2.3 | Dec 5, 2023 |
0.1.4 | Nov 15, 2023 |
0.1.3 | Oct 23, 2023 |
#672 in Command line utilities
24KB
180 lines
Rust csvchk
Vertical view of delimited text records.
Usage
Run with -h|--help
to read usage:
$ csvchk -h
Usage: csvchk [OPTIONS] [FILES]...
Arguments:
[FILES]... [default: -]
Options:
-s, --separator <SEPARATOR>
-l, --limit <LIMIT> [default: 1]
-n, --number
-N, --no-headers
-d, --dense
-c, --columns <COLUMNS>
-h, --help Print help
-V, --version Print version
Default Input is STDIN
The optional "-" for the input filename indicates that STDIN is the default input stream:
$ cat tests/inputs/books.csv | csvchk
// ****** Record 1 ******//
Author : Émile Zola
Year : 1865
Title : La Confession de Claude
$ csvchk - < tests/inputs/books.csv
// ****** Record 1 ******//
Author : Émile Zola
Year : 1865
Title : La Confession de Claude
Default Limit to One Record
By default, the program will show you the first record:
$ csvchk tests/inputs/books.csv
// ****** Record 1 ******//
Author : Émile Zola
Year : 1865
Title : La Confession de Claude
Use the -l|--limit
option to indicate more records:
$ csvchk tests/inputs/books.csv --limit 2
// ****** Record 1 ******//
Author : Émile Zola
Year : 1865
Title : La Confession de Claude
// ****** Record 2 ******//
Author : Samuel Beckett
Year : 1952
Title : Waiting for Godot
If you use 0
, then all records will be shown:
$ csvchk tests/inputs/books.csv --limit 0
// ****** Record 1 ******//
Author : Émile Zola
Year : 1865
Title : La Confession de Claude
// ****** Record 2 ******//
Author : Samuel Beckett
Year : 1952
Title : Waiting for Godot
// ****** Record 3 ******//
Author : Jules Verne
Year : 1870
Title : 20,000 Leagues Under the Sea
Number Columns
The -n|--number
option will show you 1-based column numbers suitable for field selection with awk
, cut
, or cutr
.
For instance, if I wanted to extract the year of publication:
$ csvchk tests/inputs/books.tsv -n
// ****** Record 1 ******//
1 Author : Émile Zola
2 Year : 1865
3 Title : La Confession de Claude
$ cut -f 2 tests/inputs/books.tsv
Year
1865
1952
1870
No Headers
Some files have no headers:
$ cat tests/inputs/nohdr.csv
a,b,c
d,e,f
g,h,i
The -N|--no-headers
option will supply "Field*" names:
$ csvchk --no-headers tests/inputs/nohdr.csv
// ****** Record 1 ******//
Field1 : a
Field2 : b
Field3 : c
Defining/Overriding Column Names
Use -c|--columns
to supply your own column names, e.g., in the case of a file with no headers:
$ csvchk -c 1,2,3 tests/inputs/nohdr.csv
// ****** Record 1 ******//
1 : d
2 : e
3 : f
Even with a file that has headers, you can override the column names:
$ csvchk -c 1,2,3 tests/inputs/books.tsv
// ****** Record 1 ******//
1 : Émile Zola
2 : 1865
3 : La Confession de Claude
Note that --no-headers
causes the first row to be treated as a data row:
$ csvchk -c 1,2,3 -N tests/inputs/books.tsv -l 2
// ****** Record 1 ******//
1 : Author
2 : Year
3 : Title
// ****** Record 2 ******//
1 : Émile Zola
2 : 1865
3 : La Confession de Claude
Detects Record Separator
Here is a CSV file:
$ cat tests/inputs/books.csv
Author,Year,Title
Émile Zola,1865,La Confession de Claude
Samuel Beckett,1952,Waiting for Godot
Jules Verne,1870,"20,000 Leagues Under the Sea"
It assumes comma-separated:
$ csvchk tests/inputs/books.csv
// ****** Record 1 ******//
Author : Émile Zola
Year : 1865
Title : La Confession de Claude
Here is a tab-delimited file:
$ cat tests/inputs/books.tsv
Author Year Title
Émile Zola 1865 La Confession de Claude
Samuel Beckett 1952 Waiting for Godot
Jules Verne 1870 20,000 Leagues Under the Sea
It works:
$ csvchk tests/inputs/books.tsv
// ****** Record 1 ******//
Author : Émile Zola
Year : 1865
Title : La Confession de Claude
Indicate Separator
This file uses semicolons:
$ cat tests/inputs/books.txt
Author;Year;Title
Émile Zola;1865;La Confession de Claude
Samuel Beckett;1952;Waiting for Godot
Jules Verne;1870;20,000 Leagues Under the Sea
So use -s|--separator
to indicate:
$ csvchk -s \; tests/inputs/books.txt
// ****** Record 1 ******//
Author : Émile Zola
Year : 1865
Title : La Confession de Claude
Multiple Files
When run with multiple files, a header is inserted. Note in the following run that the record separator is guessed for each input file:
$ csvchk tests/inputs/nohdr.csv tests/inputs/movies1.csv \
> tests/inputs/movies2.csv tests/inputs/movies2.tsv tests/inputs/books.tsv
==> tests/inputs/nohdr.csv <==
// ****** Record 1 ******//
a : d
b : e
c : f
==> tests/inputs/movies1.csv <==
// ****** Record 1 ******//
title : The Blues Brothers
year : 1980
director : John Landis
==> tests/inputs/movies2.csv <==
// ****** Record 1 ******//
title : The Blues Brothers
year : 1980
director : John Landis
==> tests/inputs/movies2.tsv <==
// ****** Record 1 ******//
title : The Blues Brothers
year : 1980
director : John Landis
==> tests/inputs/books.tsv <==
// ****** Record 1 ******//
Author : Émile Zola
Year : 1865
Title : La Confession de Claude
Filtering
Use the -g|--grep
option to find files matching a given pattern:
$ csvchk -g Hooper tests/inputs/movies2.csv
// ****** Record 2 ****** //
title : Les Misérables
year : 2012
director : Tom Hooper
The default is to use case-sensitive matching, so a search for "hooper" will find nothing.
Use the -i|--insensitive
option to remedy this:
$ csvchk -g hooper -i tests/inputs/movies2.csv
// ****** Record 2 ****** //
title : Les Misérables
year : 2012
director : Tom Hooper
You can use a regular expression, for instance, to find "b" followed by either "l" or "r," case-insensitive, using -l 0
to get all matches:
$ csvchk --grep 'b[lr]' -i tests/inputs/movies2.csv -l 0`
// ****** Record 1 ****** //
title : The Blues Brothers
year : 1980
director : John Landis
// ****** Record 2 ****** //
title : Les Misérables
year : 2012
director : Tom Hooper
You can also indicate the regex in uppercase:
$ csvchk --grep 'B[LR]' -i tests/inputs/movies2.csv -l 0`
// ****** Record 1 ****** //
title : The Blues Brothers
year : 1980
director : John Landis
// ****** Record 2 ****** //
title : Les Misérables
year : 2012
director : Tom Hooper
Author
Ken Youens-Clark kyclark@gmail.com
Dependencies
~28–45MB
~790K SLoC