#query-engine #apache-arrow #data-fusion

bin+lib ballista-cli

Command Line Client for Ballista distributed query engine

5 releases (breaking)

0.12.0 Feb 7, 2024
0.11.0 Feb 28, 2023
0.10.0 Nov 21, 2022
0.9.0 Oct 26, 2022
0.7.0 May 16, 2022

#34 in #data-fusion

Apache-2.0

89KB
1.5K SLoC

Ballista Command-line Interface

Ballista is a distributed query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

The Ballista CLI allows SQL queries to be executed by an in-process DataFusion context, or by a distributed Ballista context.

USAGE:
    ballista-cli [FLAGS] [OPTIONS]

FLAGS:
    -h, --help       Prints help information
    -q, --quiet      Reduce printing other than the results and work quietly
    -V, --version    Prints version information

OPTIONS:
    -c, --batch-size <batch-size>    The batch size of each query, or use DataFusion default
    -p, --data-path <data-path>      Path to your data, default to current directory
    -f, --file <file>...             Execute commands from file(s), then exit
        --format <format>            Output format [default: table]  [possible values: csv, tsv, table, json, ndjson]
        --host <host>                Ballista scheduler host
        --port <port>                Ballista scheduler port

Example

Create a CSV file to query.

$ echo "1,2" > data.csv
$ ballista-cli

Ballista CLI v0.6.0

> CREATE EXTERNAL TABLE foo (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
0 rows in set. Query took 0.001 seconds.

> SELECT * FROM foo;
+---+---+
| a | b |
+---+---+
| 1 | 2 |
+---+---+
1 row in set. Query took 0.017 seconds.

Ballista-Cli

If you want to execute the SQL in ballista by ballista-cli, you must build/compile ballista-cli first.

cd arrow-ballista/ballista-cli
cargo build

The Ballista CLI can connect to a Ballista scheduler for query execution.

ballista-cli --host localhost --port 50050

Dependencies

~93MB
~1.5M SLoC