24 stable releases (11 major)
Uses new Rust 2024
| 53.0.0 | Mar 24, 2026 |
|---|---|
| 52.5.0 | Apr 11, 2026 |
| 52.4.0 | Mar 22, 2026 |
| 51.0.0 | Nov 19, 2025 |
| 42.2.0 | Jan 22, 2025 |
#2289 in Database interfaces
56,657 downloads per month
Used in datafusion-variant
3MB
50K
SLoC
Apache DataFusion sqllogictest
Apache DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
This crate is a submodule of DataFusion that contains an implementation of sqllogictest.
Overview
This crate uses sqllogictest-rs to parse and run .slt files in the test_files directory of
this crate or the data/sqlite directory of the datafusion-testing repository.
Testing setup
rustup update stableDataFusion uses the latest stable release of Rustgit submodule initgit submodule update --init --remote --recursive
Running tests: TLDR Examples
# Run all tests
cargo test --test sqllogictests
# Run all tests, with debug logging enabled
RUST_LOG=debug cargo test --test sqllogictests
# Run only the tests in `information_schema.slt`
cargo test --test sqllogictests -- information_schema
# Automatically update ddl.slt with expected output
cargo test --test sqllogictests -- ddl --complete
# Run ddl.slt, printing debug logging to stdout
RUST_LOG=debug cargo test --test sqllogictests -- ddl
Per-file timing summary
The sqllogictest runner can emit deterministic per-file elapsed timings to help identify slow test files.
By default (--timing-summary auto), timing summary output is disabled in local
TTY runs and shows a top-slowest summary in non-TTY/CI runs.
--timing-top-n / SLT_TIMING_TOP_N must be a positive integer (>= 1).
# Show top 10 slowest files (good for CI)
cargo test --test sqllogictests -- --timing-summary top --timing-top-n 10
# Show full per-file timing table
cargo test --test sqllogictests -- --timing-summary full
# Same controls via environment variables
SLT_TIMING_SUMMARY=top SLT_TIMING_TOP_N=15 cargo test --test sqllogictests
# Optional debug logging for per-task slow files (>30s), disabled by default
SLT_TIMING_DEBUG_SLOW_FILES=1 cargo test --test sqllogictests
Cookbook: Adding Tests
- Add queries
Add the setup and queries you want to run to a .slt file
(my_awesome_test.slt in this example) using the following format:
query
CREATE TABLE foo AS VALUES (1);
query
SELECT * from foo;
- Fill in expected answers with
--completemode
Running the following command will update my_awesome_test.slt with the expected output:
cargo test --test sqllogictests -- my_awesome_test --complete
- Verify the content
In the case above, my_awesome_test.slt will look something like
statement ok
CREATE TABLE foo AS VALUES (1);
query I
SELECT * from foo;
----
1
Assuming it looks good, check it in!
Cookbook: Testing for whitespace
The sqllogictest runner will automatically strip trailing whitespace, meaning
it requires an additional effort to verify that trailing whitespace is correctly produced
For example, the following test can't distinguish between Andrew and Andrew
(with trailing space):
query T
select substr('Andrew Lamb', 1, 7)
----
Andrew
To test trailing whitespace, project additional non-whitespace column on the
right. For example, by selecting '|' after the column of interest, the test
can distinguish between Andrew and Andrew :
# Note two spaces between `Andrew` and `|`
query TT
select substr('Andrew Lamb', 1, 7), '|'
----
Andrew |
# Note only one space between `Andrew` and `|`
query TT
select substr('Andrew Lamb', 1, 6), '|'
----
Andrew |
Cookbook: Ignoring volatile output
Sometimes parts of a result change every run (timestamps, counters, etc.). To keep the rest of the snapshot checked in, replace those fragments with the <slt:ignore> marker inside the expected block. During validation the marker acts like a wildcard, so only the surrounding text must match.
query TT
EXPLAIN ANALYZE SELECT * FROM generate_series(100);
----
Plan with Metrics LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=100, batch_size=8192], metrics=[output_rows=101, elapsed_compute=<slt:ignore>, output_bytes=<slt:ignore>]
Reference
Running tests: Validation Mode
In this mode, sqllogictests runs the statements and queries in a .slt file, comparing the expected output in the
file to the output produced by that run.
For example, to run all tests suites in validation mode
cargo test --test sqllogictests
sqllogictests also supports cargo test style substring matches on file names to restrict which tests to run
# information_schema.slt matches due to substring matching `information`
cargo test --test sqllogictests -- information
Additionally, executing specific tests within a file is also supported. Tests are identified by line number within
the .slt file; for example, the following command will run the test in line 709 for file information.slt along
with any other preparatory statements:
cargo test --test sqllogictests -- information:709
Running tests: Postgres compatibility
Test files that start with prefix pg_compat_ verify compatibility
with Postgres by running the same script files both with DataFusion and with Postgres
In order to have the sqllogictest run against an existing running Postgres instance, do:
PG_COMPAT=true PG_URI="postgresql://postgres@127.0.0.1/postgres" cargo test --features=postgres --test sqllogictests
The environment variables:
PG_COMPATinstructs sqllogictest to run against Postgres (not DataFusion)PG_URIcontains alibpqstyle connection string, whose format is described in the docs
One way to create a suitable a postgres container in docker is to use
the Official Image with a command
such as the following. Note the collation must be set to C otherwise
ORDER BY will not match DataFusion and the tests will diff.
docker run \
-p5432:5432 \
-e POSTGRES_INITDB_ARGS="--encoding=UTF-8 --lc-collate=C --lc-ctype=C" \
-e POSTGRES_HOST_AUTH_METHOD=trust \
postgres
If you do not want to create a new postgres database and you have docker installed you can skip providing a PG_URI env variable and the sqllogictest runner will automatically create a temporary postgres docker container. For example:
PG_COMPAT=true cargo test --features=postgres --test sqllogictests
Running Tests: tpch
Test files in tpch directory runs against the TPCH data set (SF =
0.1), which must be generated before running. You can use following
command to generate tpch data, assuming you are in the repository
root:
mkdir -p datafusion/sqllogictest/test_files/tpch/data
docker run -it \
-v "$(realpath datafusion/sqllogictest/test_files/tpch/data)":/data \
ghcr.io/scalytics/tpch-docker:main -vf -s 0.1
Then you need to add INCLUDE_TPCH=true to run tpch tests:
INCLUDE_TPCH=true cargo test --test sqllogictests
Running Tests: sqlite
Test files in data/sqlite directory of the datafusion-testing crate were
sourced from the sqlite test suite and have been cleansed and updated to
run within DataFusion's sqllogictest runner.
To run the sqlite tests you need to increase the rust stack size and add
INCLUDE_SQLITE=true to run the sqlite tests:
export RUST_MIN_STACK=30485760;
INCLUDE_SQLITE=true cargo test --test sqllogictests
Note that there are well over 5 million queries in these tests and running the sqlite tests will take a long time. You may wish to run them in release-nonlto mode:
INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests
The sqlite tests can also be run with the postgres runner to verify compatibility:
export RUST_MIN_STACK=30485760;
PG_COMPAT=true INCLUDE_SQLITE=true cargo test --features=postgres --test sqllogictests
To update the sqllite expected answers use the datafusion/sqllogictest/regenerate_sqlite_files.sh script.
Note this must be run with an empty postgres instance. For example
PG_URI=postgresql://postgres@localhost:5432/postgres bash datafusion/sqllogictest/regenerate_sqlite_files.sh
Updating tests: Completion Mode
In test script completion mode, sqllogictests reads a prototype script and runs the statements and queries against the
database engine. The output is a full script that is a copy of the prototype script with result inserted.
You can update the tests / generate expected output by passing the --complete argument.
To regenerate and complete the sqlite test suite's files in datafusion-testing/data/sqlite/ please refer to the './regenerate_sqlite_files.sh' file.
WARNING: The regenerate_sqlite_files.sh is experimental and should be understood and run with an abundance of caution. When run the script will clone a remote repository locally, replace the location of a dependency with a custom git version, will replace an existing .rs file with one from a github gist and will run various commands locally.
# Update ddl.slt with output from running
cargo test --test sqllogictests -- ddl --complete
Running tests: scratchdir
The DataFusion sqllogictest runner automatically creates a directory
named test_files/scratch/<filename>, creating it if needed and
clearing any file contents if it exists.
For example, the test_files/copy.slt file should use scratch
directory test_files/scratch/copy.
Tests that need to write temporary files should write (only) to this directory to ensure they do not interfere with others concurrently running tests.
Running tests: Substrait round-trip mode
This mode will run all the .slt test files in validation mode, adding a Substrait conversion round-trip for each generated DataFusion logical plan (SQL statement → DF logical → Substrait → DF logical → DF physical → execute).
Not all statements will be round-tripped, some statements like CREATE, INSERT, SET or EXPLAIN statements will be issued as is, but any other statement will be round-tripped to/from Substrait.
WARNING: as there are still a lot of failures in this mode (https://github.com/apache/datafusion/issues/16248), it is not enforced in the CI, instead, it needs to be run manually with the following command:
cargo test --test sqllogictests -- --substrait-round-trip
For focusing on one specific failing test, a file:line filter can be used:
cargo test --test sqllogictests -- --substrait-round-trip binary.slt:23
.slt file format
sqllogictest was originally written for SQLite to verify the
correctness of SQL queries against the SQLite engine. The format is designed
engine-agnostic and can parse sqllogictest files (.slt), runs
queries against an SQL engine and compares the output to the expected
output.
Tests in the .slt file are a sequence of query records generally
starting with CREATE statements to populate tables and then further
queries to test the populated data.
Each .slt file runs in its own, isolated SessionContext, to make the test setup explicit and so they can run in
parallel. Thus it important to keep the tests from having externally visible side effects (like writing to a global
location such as /tmp/)
Query records follow the format:
# <test_name>
query <type_string> <sort_mode>
<sql_query>
----
<expected_result>
test_name: Uniquely identify the test name (DataFusion only)type_string: A short string that specifies the number of result columns and the expected datatype of each result column. There is one character in the <type_string> for each result column. The characters codes are:- 'B' - Boolean,
- 'D' - Datetime,
- 'I' - Integer,
- 'P' - timestamP,
- 'R' - floating-point results,
- 'T' - Text,
- "?" - any other types
expected_result: In the results section, some values are converted according to some rules:- floating point values are rounded to the scale of "12",
- NULL values are rendered as
NULL, - empty strings are rendered as
(empty), - boolean values are rendered as
true/false, - this list can be not exhaustive, check the
datafusion/sqllogictest/src/engines/conversion.rsfor details.
sort_mode: If included, it must be one ofnosort(default),rowsort, orvaluesort. Innosortmode, the results appear in exactly the order in which they were received from the database engine. Thenosortmode should only be used on queries that have anORDER BYclause or which only have a single row of result, since otherwise the order of results is undefined and might vary from one database engine to another. Therowsortmode gathers all output from the database engine then sorts it by rows on the client side. Sort comparisons use sort_unstable on the rendered text representation of the values. Hence, "9" sorts after "10", not before. Thevaluesortmode works likerowsortexcept that it does not honor row groupings. Each individual result value is sorted on its own.
⚠️ It is encouraged to either apply
order by, or userowsortfor queries without explicitorder byclauses.
Example
# group_by_distinct
query TTI
SELECT a, b, COUNT(DISTINCT c) FROM my_table GROUP BY a, b ORDER BY a, b
----
foo bar 10
foo baz 5
foo 4
3
Dependencies
~112MB
~2M SLoC