2 releases
0.1.1 | Dec 10, 2023 |
---|---|
0.1.0 | Dec 10, 2023 |
#2188 in Database interfaces
25KB
150 lines
DFQ - DataFusion Query
A CLI tool for running SQLs over various data sources using Apache Arrow DataFusion SQL Query Engine.
Usage
$ dfq --help
A CLI for running SQLs over various data sources.
Usage: dfq [OPTIONS] [DATA_AND_SQL]...
Arguments:
[DATA_AND_SQL]... data sources and SQL, e.g. `sample.csv "select * from t0"`
Options:
-d, --dialect <DIALECT>
-o, --output <OUTPUT> [default: terminal] [possible values: json, csv, terminal]
-h, --help Print help
$ dfq samples/users.csv samples/orders.csv "select count(*) as num_orders, t0.name from t0 join t1 on t0.id = t1.user group by t0.name order by num_orders"
+------------+--------+
| num_orders | name |
+------------+--------+
| 1 | Henry |
| 2 | Taylor |
+------------+--------+
$ dfq samples/orders.csv "describe t0"
+-------------+-------------------------+-------------+
| column_name | data_type | is_nullable |
+-------------+-------------------------+-------------+
| id | Int64 | YES |
| user | Int64 | YES |
| ts | Timestamp(Second, None) | YES |
| status | Utf8 | YES |
+-------------+-------------------------+-------------+
Status
Supported Data Sources
- Local line delimeted JSON file, ends with
.json
or.json.gz
- (TODO) Local JSON array file
- Local CSV file, ends with
.csv
or.csv.gz
- Parquet file, ends with
.parquet
or.prq
Supported Output Formats
- Printed table format (default)
- JSON array format
- JSON line delimeted format
- CSV
- Parquet
All outputs are directed to stdout now, need the user to manually pipe them to a file if needed.
Dependencies
~64MB
~1M SLoC