3 releases (breaking)

0.3.0 Mar 12, 2023
0.2.0 Mar 5, 2023
0.0.1 Feb 24, 2023

#117 in #identifier

48 downloads per month

MIT/Apache

115KB
4K SLoC

Syntax

This crate holds the Hebi lexer, parser, and AST.

The lexer is automatically generated using logos. The parser is a hand-written recursive descent parser.

Indentation is lexed by assigning the first non-whitespace token on each line the number of whitespace characters that precede it. For example:

asdf
  asdf
  asdf asdf

Would produce the following tokens:

Identifier("asdf", indentation_level=0)
Identifier("asdf", indentation_level=2)
Identifier("asdf", indentation_level=2)
Identifier("asdf", indentation_level=None)

Note the last token, which doesn't have any indentation, because it is not the first non-whitespace token on its line.

The parser uses the indentation levels to track blocks using these functions:

  • no_indent, no indentation may be attached to the current token
  • indent_eq, the indentation level of the current token is equal to the current indentation stack
  • indent_gt, the indentation level of the current token is greater than the current indentation stack. This function also adds the new indentation level to the indentation stack.
  • dedent, the indentation level of the current token is lower than the current indentation stack. This functino also pops the last indentation level off of the indentation stack.

These functions are used to query for indentation at strategic places, but the parser code can be written without caring about the indentation where it doesn't matter. For example, see the import_stmt node, which does not care about indentation at all, and so it doesn't have to track it, either!

Dependencies

~2–2.9MB
~29K SLoC