#sql-query #query-parser #statement #pest-parser #results #implemented #filtering

bin+lib sql_query_parser

The sql_query_parser project provides a custom Rust parser for SQL-like queries, implemented using the Pest crate. It can parse SELECT statements with advanced query capabilities, including joins, conditional filtering, aggregate functions, grouping, ordering, and limiting the results.

1 unstable release

new 0.1.0 Nov 13, 2024

#19 in #results

Download history 96/week @ 2024-11-10

96 downloads per month

MIT license

19KB
261 lines

sql_query_parser

Overview

The sql_query_parser project provides a custom Rust parser for SQL-like queries, implemented using the Pest crate. It can parse SELECT statements with advanced query capabilities, including joins, conditional filtering, aggregate functions, grouping, ordering, and limiting the results.

Technical Description

The parser processes SQL-like queries by breaking down the input string into a series of tokens, recognizing various syntax elements such as keywords, identifiers (table or column names), operators, and literals (numbers and strings). Each of these elements is mapped to a specific rule in the grammar, which is defined using regular expressions in the pest parser syntax. The grammar specifies how tokens can be combined to form valid SQL-like statements. It is implemented as a set of grammar rules that correspond to different parts of the SQL query, such as SELECT, FROM, WHERE, JOIN, GROUP BY, etc.

The parser works by recursively applying these rules to build a parse tree (AST). The tree structure represents the relationships between the components of the query and contains detailed information about selected fields, target tables, filtering conditions, sorting directions etc. The AST can be used in various ways, such as further processing and integration with other systems for query execution, data retrieval and output formatting and working with databases in general.

The sql_query_parser processes base SQL-like clauses, specifically:

  • SELECT clause: Allows selecting fields or using aggregate functions (COUNT, SUM, AVG, MAX, MIN) on them.
  • FROM and JOIN clauses: Defines data sources and relationships between them using JOIN and ON with = condition on fields.
  • WHERE clause: Filters records with conditions on fields, supporting comparison (=, !=, <, >, <=, >=) and logical (AND, OR) operators.
  • GROUP BY clause: Enables grouping of results by multiple fields.
  • ORDER BY clause: Orders results based on specified fields, supporting ascending (ASC) and descending (DESC) sorting.
  • LIMIT clause: Limits the number of returned results.

After parsing, each node of the AST is displayed in a custom format defined in the fmt::Display implementation, which organizes the output in a clear, indented format. It can be useful for:

  • Analyzing query structure and understanding the components of the SQL query.
  • Identifying key parts of the query for optimization decisions.
  • Providing a clear structure to easily check for syntax errors in SQL queries.

Example Input

SELECT name, SUM(sales)
FROM products
WHERE quantity >= 50 AND producer = "Producer"
GROUP BY category, price
ORDER BY sales DESC
LIMIT 20; 

Example Output

- select_stmt
  - agg_field > identifier: "name"
  - agg_field
    - aggregate_func: "SUM"
    - identifier: "sales"
  - from_clause > identifier: "products"
  - where_clause
    - where_condition
        - identifier: "quantity"
        - comparison_op: ">="
        - number: "50"
    - logical_op: "AND"
    - where_condition
        - identifier: "producer"
        - comparison_op: "="
        - identifier: "\"Producer\""
  - group_by_clause
    - identifier: "category"
    - identifier: "price"
  - order_by_clause
    - identifier: "sales"
    - order_type: "DESC"
  - limit_clause > number: "20"

Grammar Rules

reserved_keyword = {
    "SELECT" | "FROM" | "JOIN" | "ON" | "WHERE" | "GROUP BY" |
    "ORDER BY" | "LIMIT" | "AND" | "OR" |
    "COUNT" | "SUM" | "AVG" | "MAX" | "MIN" |
    "ASC" | "DESC"
}

WHITESPACE     = _{ " " | "\t" | "\n" }
identifier     = @{ !reserved_keyword ~ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
number         =  @{ ASCII_DIGIT+ }
string         =  { "\"" ~ (!"\"" ~ ANY)* ~ "\"" }
comparison_op  =  { "!=" | ">=" | "<=" |">" | "<" | "=" }
logical_op     =  { "AND" | "OR" }
aggregate_func =  { "COUNT" | "SUM" | "AVG" | "MAX" | "MIN" }
order_type     =  { "ASC" | "DESC"}

select_stmt = { "SELECT" ~ (agg_field ~ ("," ~ agg_field)* ~ ("," ~ identifier)*)
~ from_clause
~ (join_clause)? 
~ (where_clause)?
~ (group_by_clause)? 
~ (order_by_clause)? 
~ (limit_clause)? 
~ ";" }

agg_field = { aggregate_func ~ "(" ~ identifier ~ ")" | identifier }
where_condition = { identifier ~ comparison_op ~ (number | string) }

from_clause = { "FROM" ~ identifier }
join_clause = { "JOIN" ~ identifier ~ "ON" ~ identifier ~ "=" ~ identifier }
where_clause = { "WHERE" ~ where_condition ~ (logical_op ~ where_condition)* }
group_by_clause = { "GROUP BY" ~ identifier ~ ("," ~ identifier)* }
order_by_clause = { "ORDER BY" ~ identifier ~ (order_type)? ~ ("," ~ identifier ~ (order_type)?)* }
limit_clause = { "LIMIT" ~ number }

Dependencies

~2.2–3MB
~59K SLoC