1 unstable release
new 0.4.7 | Apr 26, 2025 |
---|
#1252 in Database interfaces
40KB
672 lines
Pirkle — Query CSV and SQLite with PRQL
Pirkle is a command-line tool to query CSV and SQLite files using the PRQL language.
It loads CSV files into an in-memory SQLite database — allowing you to join them with other tables, apply filters, and export results in table or CSV format.
Features
- Query CSV files as structured tables
- Join CSV and SQLite files together
- Write expressive queries using PRQL
- Output as a pretty table, CSV, JSON, or logfmt
- Inspect the generated SQL
- View schema information for files
- Lightweight, fast, and written in Rust
Installation
Prebuilt binaries
Prebuilt binaries are available on the Releases page.
The macOS binaries are code-signed and notarized with Apple, but as a command-line tool, you may still need to:
-
Right-click the executable and choose "Open" the first time you run it
OR
-
Remove the quarantine attribute via Terminal:
xattr -d com.apple.quarantine /path/to/pirkle
From crates.io
$ cargo install pirkle
From Source
Or install using Rust:
git clone https://github.com/dloss/pirkle.git
cd pirkle
cargo install --path .
Usage
Basic Queries
# Query a CSV file. CSV files are auto-loaded as SQLite tables.
$ pirkle examples/employees.csv --query "from employees | filter country == 'USA' | select {name, age}"
name age
---------------------
John Smith 32
Robert Johnson 41
James Brown 39
# Query a SQLite file (after generating with examples/make_sqlite.sh)
$ pirkle examples/company.sqlite --query "from employees | select {name, age}"
name age
--------------
Alice 30
Bob 45
Charlie 25
Diana 35
Eva 28
# Alternative syntax using -- delimiter
$ pirkle examples/employees.csv -- "from employees | filter department == 'Engineering' | select {name, age}"
name age
---------------------
John Smith 32
Robert Johnson 41
Ahmed Hassan 29
Sarah Kim 31
Reading from Standard Input
Pirkle supports reading CSV data from standard input, making it easy to pipe data from other commands:
# Pipe data into pirkle
$ cat examples/employees.csv | pirkle stdin --query "from stdin | filter salary > 70000"
# Use stdin with files
$ cat examples/orders.csv | pirkle stdin examples/customers.csv --query "from stdin | join customers (==customer_id)"
# Custom table name for stdin data
$ cat examples/employees.csv | pirkle stdin:workers --query "from workers | sort {-salary}"
Key features:
- Auto-detection: Data on stdin is loaded as a table named "stdin"
- Explicit reference: Use the filename
stdin
to read from stdin - Custom naming: Use
stdin:tablename
for custom table names - Query from stdin: If no query is provided with
--query
or--
, Pirkle will read the query from stdin:$ echo "from employees | filter country == 'USA'" | pirkle examples/employees.csv
- Multiple references: Use the same stdin data with different table names
$ cat examples/employees.csv | pirkle stdin:workers stdin:staff --query "from workers | join staff (==id)"
Pirkle intelligently determines how to use stdin based on your command arguments, making it a flexible tool for data pipelines.
Viewing Schema Information
To see the structure of your tables:
# View schemas with the --schema flag
$ pirkle examples/employees.csv --schema
Table: employees
Columns:
id (INTEGER)
name (TEXT)
department (TEXT)
age (INTEGER)
salary (INTEGER)
country (TEXT)
Show SQL without executing
You can use the --show-sql
flag to see the SQL that would be generated without executing the query:
$ pirkle examples/employees.csv --query "from employees | filter country == 'USA'" --show-sql
SELECT
*
FROM
employees
WHERE
country = 'USA'
-- Generated by PRQL compiler version:0.12.2 (https://prql-lang.org)
This also works with PRQL files:
$ pirkle examples/employees.csv --query examples/queries/avg_age_by_department.prql --show-sql
SELECT
department_id,
AVG(age) AS avg_age
FROM
employees
GROUP BY
department_id
-- Generated by PRQL compiler version:0.12.2 (https://prql-lang.org)
Output formats
Default is a readable table format.
To output CSV:
$ pirkle examples/employees.csv --format csv --query "from employees | filter salary > 70000"
1,John Smith,Engineering,32,85000,USA
3,Robert Johnson,Engineering,41,92000,USA
5,Ahmed Hassan,Engineering,29,75000,Egypt
8,Sarah Kim,Engineering,31,83000,South Korea
9,James Brown,Sales,39,85000,USA
10,Fatima Al-Farsi,Marketing,36,76000,UAE
Other supported formats:
# JSON Lines format
$ pirkle examples/employees.csv --format jsonl --query "from employees | filter country == 'USA'"
{"age":32,"country":"USA","department":"Engineering","id":1,"name":"John Smith","salary":85000}
{"age":41,"country":"USA","department":"Engineering","id":3,"name":"Robert Johnson","salary":92000}
{"age":39,"country":"USA","department":"Sales","id":9,"name":"James Brown","salary":85000}
# logfmt format
$ pirkle examples/employees.csv --format logfmt --query "from employees | filter country == 'USA'"
id="1" name="John Smith" department="Engineering" age="32" salary="85000" country="USA"
id="3" name="Robert Johnson" department="Engineering" age="41" salary="92000" country="USA"
id="9" name="James Brown" department="Sales" age="39" salary="85000" country="USA"
Using PRQL files
You can use prewritten PRQL query files:
# Use a PRQL file directly with --query
$ pirkle examples/employees.csv --query examples/queries/top_5_paid.prql
name department salary
---------------------------------------
Robert Johnson Engineering 92000
John Smith Engineering 85000
James Brown Sales 85000
Sarah Kim Engineering 83000
Fatima Al-Farsi Marketing 76000
Joining tables
To join tables, use the join operation:
$ pirkle examples/orders.csv examples/customers.csv --query "from orders
join customers (==customer_id)
select {orders.order_id, customers.name, orders.amount}"
order_id name amount
-------------------------------
1 Acme Corp 250
2 Globex Inc 300
3 Acme Corp 150
4 Initech 400
5 Stark Industries 200
Example Data
Included example files:
examples/employees.csv
: Employee data with department, salary, and country informationexamples/departments.csv
: Department names and IDsexamples/customers.csv
,examples/orders.csv
: Customer-order relationship dataexamples/queries/*.prql
: Sample PRQL queries
License
MIT
Dependencies
~56–84MB
~1.5M SLoC