14 releases

0.1.13 Jun 24, 2023
0.1.12 Jun 21, 2023
0.1.11 Mar 28, 2023
0.1.8 Feb 21, 2023
0.1.7 Jan 31, 2023

#197 in Text processing

Download history 595/week @ 2024-07-25 553/week @ 2024-08-01 718/week @ 2024-08-08 812/week @ 2024-08-15 666/week @ 2024-08-22 661/week @ 2024-08-29 527/week @ 2024-09-05 605/week @ 2024-09-12 1030/week @ 2024-09-19 745/week @ 2024-09-26 364/week @ 2024-10-03 1112/week @ 2024-10-10 1069/week @ 2024-10-17 972/week @ 2024-10-24 762/week @ 2024-10-31 705/week @ 2024-11-07

3,674 downloads per month
Used in 6 crates (via optd-datafusion-repr)

MIT license

25KB
608 lines


feature: phil-wadler authors:

  • "ice1000" start_date: "2022/12/01"

Crates.io

Wadler-style algebraic pretty printing API for SQL

This RFC proposes a new API for pretty-printing pseudo "structures". The purpose of the API is to supersede the current implementation of SQL explain, and maybe more.

Goals

  • Make the output of SQL explain "cooler" in the sense that ASCII (or Unicode) art are used to help with the readability of the output.
  • Implementation-wise, the API should be extensible and not tightly coupled with the actual SQL syntax so that it can be used for other purposes.
  • The current implementation is going to be replaced by the new API, if things went well.

Non-goals

  • The new API is not designed for performance-critical applications. SQL explain is not considered to be such an application.
  • The new API does not aim to be super flexible like the pretty crate. This gives us spaces to simplify the design and implementation.

Motivation

I tried to use the pretty crate to implement SQL explain, but it turned out to be limited in many ways:

  • The standard Wadler-style pretty printing API only controls lines, indentation, text wrapping, etc. which is not suitable for sophisticated insertion of the box-making or table-making characters.
  • It does not support "wrapping" the output in any ways. We want to make a big "box" around the output.
  • It does not support the tree-making characters, which are essential but requires a stack in the doc-to-string algorithm. The standard implementation of pretty is a pure (Config, Doc) -> String algorithm with potential configurations. I believe that we essentially need to upgrade this from a reader monad to a state monad.
  • It supports horizontal and vertical "squeezing" of the output (say, limit the max column/line numbers, and try to fit in by inserting/removing new lines), but we only need horizontal squeezing.

However, the standard Wadler-style "algebraic" pretty printing API is well-designed and can be extended to support the features we desire. I saw a screenshot by @xxchan on a private Slack channel that shows the SQL explain output of databend's system, which inspired me to write this RFC.

Intended behavior

  • Users specify a preferred width, usually the width of the terminal, or 80 or 120, etc.
  • The API automatically calculates the actual width of the output, based on the preferred width.
    • If everything can be done in one line, the actual width is the line's width, and the output will be one-linear.
    • If the output cannot be done in one line, the output will try to break down the output into multiple lines, and retry to fit the output into the preferred width for every line.
  • The API supports wrapping the output with beautiful ASCII/Unicode art.

Implementation

These contents are subject to future changes.

Types

Types XmlNode and Pretty for pretty printing data

  • These enums are inductive-inductively defined which represents an object that can be displayed as a string.
  • The width and height of the pretty-printed string can be calculated in advance.
  • Instances of the enum Pretty are hereafter called "pretty" or "pretties".
  • Instances of struct XmlNode represent XML-like data that has a name, a list of attributes, and a list of children nodes.

Variants of Pretty:

  • Variant Record that brutally pretty-prints an XML-like data.
    • It contains an XML node.
  • Variant Array that brutally pretty-prints an array-like data.
    • It contains a list of pretties.
  • Variant Text that pretty-prints a string.
    • It contains a copy-on-write string.

Record PrettyConfig for pretty printing configuration

It contains indentation, preferred width, etc.

Record LinedBuffer for actually writing the string

It contains a mutable reference to a String, and a PrettyConfig. It understands the intended width (precomputed by PrettyConfig::interesting_*), and will try to fill an incomplete line with spaces when asked so.

Important methods

  • Pretty::ol_len_*(&self) -> usize
    • Returns the length of the pretty-printed string, under a one-linear setting.
  • Pretty::ol_build_string_*(&self, build: &mut String)
    • Builds the pretty-printed string, under a one-linear setting.
  • PrettyConfig::interesting_*
    • Predicts the width and the total length of the pretty-printed string.
  • LinedBuffer::line_* (private)
    • Generate a line, without the starting | and the ending | and the indentations. It will try to fill the intermediate spaces and lines, but not the surrounding.
  • PrettyConfig::horizon
    • Generates a line of a given length with + at the ends and - in the middle.
  • PrettyConfig::ascii
    • Calls interesting to predict the output width, and then generate the beautiful output, using pure ASCII style.
  • PrettyConfig::unicode
    • Calls interesting to predict the output width, and then generate the beautiful output, using Unicode table-making characters.

Edge cases

  • All methods handle empty lists and no-children records. No-field records are not tested yet.

Changelog

  • 2022 some time: added Unicode support
  • 2023/01/29: changed from BTreeMap to associate vector to preserve insertion order
  • 2023/01/31: fixed a bug related to pretty printing the fields
  • 2023/06/23: added support for fewer-whitespace pretty printing

No runtime deps