3 stable releases
1.1.0 | Jan 23, 2025 |
---|---|
1.0.1 | Jan 10, 2025 |
1.0.0 | Jan 7, 2025 |
#254 in Command line utilities
248 downloads per month
2MB
4K
SLoC
csvs
csvs (CSV Sql) is a command-line tool that simplifies working with CSV or TSV files by enabling SQL queries through an embedded SQLite engine. It is ideal for data analysts and developers who need SQL's flexibility to manage text-based data efficiently.
Features
SQL Power for CSV Files
Run advanced SQL queries, including JOIN
, GROUP BY
, SUM()
, or COUNT()
on CSV data.
Gain unparalleled flexibility to query, filter, sort, group, and combine data compared to traditional spreadsheet tools.
csvs also supports regular expressions in SQL queries. Refer to Regular Expressions Document.
Automatic Encoding Detection
Eliminate encoding issues with automatic detection of character encodings. Avoid garbled text and broken queries effortlessly.
Decide Data Type for Each Column
csvs scans CSV rows to determine the most appropriate data types for SQLite tables. This dynamic analysis ensures compatibility and precision, even when handling nullable fields.
See Decide Data Type for Each Column and Validating Number Document for details.
Multi-File Handling
Combine data from multiple CSV or TSV files by creating a temporary SQLite database using --in-file
. Easily perform
SQL joins across files in seconds.
Common use cases:
- Merging datasets from separate files.
- Cross-referencing data using SQL joins.
Customizable Output
Export query results as:
- CSV or TSV: Ideal for data sharing or further processing.
- SQLite Database: Retain results as
.db
files for future queries.
Control delimiters, headers, and quoting styles to suit your needs.
Interactive Mode
Explore datasets interactively without specifying queries upfront. Features include:
- Browsing imported tables.
- Ad-hoc query execution.
- Previewing and saving query results.
Multi-Statement Query Support
Execute multiple SQL statements in a single command. Transform and query data across multiple steps, with only the final result displayed.
Usage
- Display help:
csvs --help
Interactive Mode
Start csvs in interactive mode when neither --query
nor --source
is specified. This mode allows you to:
- View imported tables.
- Preview table content.
- Save query results to files interactively.
Command Options
Example
- Display version:
csvs --version
- Select specific fields from
./data/address.csv
and save the results topicked.csv
:
csvs -i ./data/address.csv -q 'SELECT "city","town","phone" FROM "address.csv"' -o picked.csv
- Process CSV data from
STDIN
:
csvs -q 'SELECT "city","town","phone" FROM "stdin"' < ./data/address.csv > picked.csv
or
cat ./data/address.csv | csvs -q 'SELECT "city","town","phone" FROM "stdin"' > picked.csv
- Perform SQL joins across multiple files:
csvs -i ./left.csv -i ./right.tsv -q 'SELECT * FROM "left.csv" AS l JOIN "right.tsv" AS r ON l."name"=r."name"'
- Leverage SQLite functions like
UPPER()
,COUNT()
, etc., and export results to a SQLite database:
csvs -i people.csv -q 'SELECT "city",COUNT(*) FROM "people.csv" GROUP BY "city" ORDER BY COUNT(*) DESC' --out-database out.db
- Start in interactive mode:
csvs -i MOCK_DATA.csv
SQL Query Notes
Mapping CSVs to Table Names
File names provided with --in-file
map directly to SQLite tables (e.g. ./sample/address.csv
becomes "address.csv"
,
and data.2024.csv
becomes "data.2024.csv"
).
Quoting Columns with Special Characters
Columns or table names with spaces, punctuation, or reserved words must be quoted. Example:
SELECT "first name", "last name"
FROM "contacts.csv"
--in-no-header
Option
If specified, column names default to "c1", "c2", "c3", etc., for header-less CSV files.
Execute Multiple Statements in a Single Query
Separate SQL statements with semicolons to execute multiple queries in sequence. Only the result of the final query is displayed.
Example:
SELECT "first name"
FROM "contacts.csv";
SELECT "age"
FROM "contacts.csv";
Error Handling
Building csvs
See Build Guide
Limitations
- Interactive mode cannot be invoked when CSV data is provided via
STDIN
. Use--in-file
to specify CSV files instead. - Large files may require significant RAM since csvs loads entire files into memory when
--out-database
is not specified. - CSV files with names starting with
sqlite_
cannot be used with--in-file
due to SQLite's reserved naming convention.
Acknowledgments
csvs relies on the following open-source projects:
- SQLite
- anyhow
- chardetng
- clap
- clap-help
- csv
- encoding_rs
- encoding_rs_rw
- indicatif
- lazy-regex
- ratatui
- r2d2
- r2d2_sqlite
- rusqlite
- smashquote
- sqlparser
- tracing
- tracing-logfmt
- tuirealm
- And many more! See the full list in the source repository.
License
csvs is licensed under the MIT license.
Dependencies
~58MB
~1M SLoC