13 major breaking releases

new 21.0.0 Mar 27, 2023
20.0.0 Mar 14, 2023
19.0.0 Feb 27, 2023
18.0.0 Feb 13, 2023
7.0.0 Feb 12, 2022

#269 in Game dev

Download history 364/week @ 2022-12-05 155/week @ 2022-12-12 261/week @ 2022-12-19 105/week @ 2022-12-26 155/week @ 2023-01-02 204/week @ 2023-01-09 342/week @ 2023-01-16 336/week @ 2023-01-23 522/week @ 2023-01-30 315/week @ 2023-02-06 369/week @ 2023-02-13 428/week @ 2023-02-20 435/week @ 2023-02-27 125/week @ 2023-03-06 308/week @ 2023-03-13 204/week @ 2023-03-20

1,103 downloads per month
Used in 2 crates

Apache-2.0

2.5MB
55K SLoC

DataFusion Command-line Interface

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

The DataFusion CLI allows SQL queries to be executed by an in-process DataFusion context.

USAGE:
    datafusion-cli [OPTIONS]

OPTIONS:
    -c, --batch-size <BATCH_SIZE>    The batch size of each query, or use DataFusion default
    -f, --file <FILE>...             Execute commands from file(s), then exit
        --format <FORMAT>            [default: table] [possible values: csv, tsv, table, json,
                                     nd-json]
    -h, --help                       Print help information
    -p, --data-path <DATA_PATH>      Path to your data, default to current directory
    -q, --quiet                      Reduce printing other than the results and work quietly
    -r, --rc <RC>...                 Run the provided files on startup instead of ~/.datafusionrc
    -V, --version                    Print version information

Example

Create a CSV file to query.

$ echo "1,2" > data.csv
$ datafusion-cli

DataFusion CLI v12.0.0

> CREATE EXTERNAL TABLE foo (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
0 rows in set. Query took 0.001 seconds.

> SELECT * FROM foo;
+---+---+
| a | b |
+---+---+
| 1 | 2 |
+---+---+
1 row in set. Query took 0.017 seconds.

Querying S3 Data Sources

The CLI can query data in S3 if the following environment variables are defined:

  • AWS_REGION
  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY

Note that the region must be set to the region where the bucket exists until the following issue is resolved:

Example:

$ aws s3 cp test.csv s3://my-bucket/
upload: ./test.csv to s3://my-bucket/test.csv

$ export AWS_REGION=us-east-1
$ export AWS_SECRET_ACCESS_KEY=***************************
$ export AWS_ACCESS_KEY_ID=**************

$ ./target/release/datafusion-cli
DataFusion CLI v12.0.0
 create external table test stored as csv location 's3://my-bucket/test.csv';
0 rows in set. Query took 0.374 seconds.
 select * from test;
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1        | 2        |
+----------+----------+
1 row in set. Query took 0.171 seconds.

DataFusion-Cli

Build the datafusion-cli by cd into the sub-directory:

cd datafusion-cli
cargo build

Dependencies

~39–72MB
~1.5M SLoC