57 releases

0.6.2 Jul 26, 2024
0.6.0 May 25, 2024
0.5.1 Dec 2, 2023
0.5.0 Oct 13, 2023
0.1.1 Nov 25, 2021

#36 in Programming languages

Download history 198/week @ 2024-05-24 19/week @ 2024-05-31 4/week @ 2024-06-07 1/week @ 2024-06-14 40/week @ 2024-06-21 131/week @ 2024-06-28 31/week @ 2024-07-05 530/week @ 2024-07-26 33/week @ 2024-08-02

52 downloads per month
Used in lambdascript

MIT license

1MB
12K SLoC

rustlr

LR-Style Parser Generator

A Tutorial with several examples is available.

New Version 0.6

This rewrite of key components consists of mainly internal changes except for how the parser interacts with the lexical tokenizer. Semantic actions in grammars can now make adjustments directly to the tokenizer. The version 0.5 style is preserved by giving the rustlr application the -zc option.


Besides traditional LR and LALR parser generation, Rustlr supports the following options

  1. An experimental feature that generates parsers for Selective Marcus-Leermakers grammars. This is a larger class of unambiguous grammars than traditional LR. For example, the following grammar
    S --> a B c  |  A B d
    B --> b | B b
    A --> a

is unambiguous but is not LR(k) because it cannot decide whether to reduce a to A with a fixed number of lookaheads. However, rustlr can still generate a deterministic parser (without backtracking) for this grammar (see the Appendix of the tutorial). Rustlr is the only practical parser generator that supports this capability.

  1. The option of creating the abstract syntax data types and semantic actions from the grammar. Rustlr grammars contain a sub-language that controls how ASTs are to be generated.

  2. Support for choosing bumpalo to create recursive ASTs that use references instead of smart pointers: this enables deep pattern matching on recursive structures.

  3. Recognizes regex-style operators *, + and ?, which simplify the writing of grammars and allow better ASTs to be created.

  4. Generates a lexical scanner automatically from the grammar.

  5. Operator precedence and associativity declarations further allow grammars to be written that's closer to EBNF syntax.

  6. The ability to train the parser, interactively or from script, for better error reporting.

  7. Generates parsers for Rust and for F#. Rustlr is designed to promote typed functional programming languages in the creation of compilers and language-analysis tools. Parser generation for other such languages will gradually become available.

Rustlr aims to simplify the creation of precise and efficient parsers and will continue to evolve and incorporate new features, though backwards compatibility will be maintained as much as possible.

Quick Example: Arithmetic Expressions and Their Abstract Syntax

The following are the contents of a Rustlr grammar, simplecalc.grammar:

auto
terminals + * - / ; ( )   # verbatim terminal symbols
valterminal Int i32       # terminal symbol with value
nonterminal E
nonterminal T : E  # specifies that AST for T should merge into E
nonterminal F : E
nonterminal ExpList
startsymbol ExpList
variant-group-for E BinaryOp + - * /    # controls AST generation

# production rules:
E --> E + T  | E - T | T
T --> T * F | T / F | F
F:Neg --> - F                   # 'Neg' names enum variant in AST
F --> Int | ( E )
ExpList --> E<;+> ;?    # ;-separated list with optional trailing ;


!mod simplecalc_ast; // !-lines are injected verbatim into the parser
!fn main()  {
!  let mut scanner1 = simplecalclexer::from_str("10+-2*4; 9-(4-1);");
!  let mut parser1 = make_parser(scanner1);
!  let parseresult = parse_with(&mut parser1);
!  let ast =
!    parseresult.
!    unwrap_or_else(|x| {
!       println!("Parsing errors encountered; results not guaranteed..");
!       x
!    });
!  println!("\nAST: {:?}\n",&ast);
!}//main

The grammar recognizes one or more arithmetic expressions separated by semicolons. In addition to a parser, the grammar generates a lexical scanner from the declarations of terminal symbols. It also created the following abstract syntax types and the semantic actions that produce instances of the types.

#[derive(Debug)]
pub enum E {
  BinaryOp(&'static str,LBox<E>,LBox<E>),
  Int(i32),
  Neg(LBox<E>),
  E_Nothing,
}
impl Default for E { fn default()->Self { E::E_Nothing } }

#[derive(Default,Debug)]
pub struct ExpList(pub Vec<LC<E>>,);

LBox and LC are structures that contain the line and column positions of the start of the AST constructs in the original source. This information is automatically inserted into the structures by the parser. LBox encapsulates a Box and serves as a custom smart pointer while LC contains the extra information in an exposed tuple. Both LBox<T> and LC<T> implement Deref<T> and DerefMut<T>, thus carrying the extra information non-intrusively.

Rustlr generates AST types based on the grammar but special declarations can control the precise structure of these types. A struct is normally generated for nonterminal symbols with a single production while an enum is generated for nonterminals with multiple productions, with a variant for each production. However, the enum variants generated from the productions for T and F are merged into the type for E by the declarations nonterminal T : E and nonterminal F : E. The variant-group-for declaration combined what would-have-been four variants into one. The Neg label on the unary minus rule separates that case from the "BinaryOp" variant group.

Rustlr AST types implement the Default trait so that a partial result is always returned even when parse errors are encountered.

Automatically generated AST types and semantic actions can always be manually overridden.

Specifying operator precedence and associativity instead of using the T and F categories is also supported.

The generated parser and lexer normally form a separate module. However, for this quick example we've injected a main directly into the parser to demonstrate how to invoke it. To run this example,

  1. Install rustlr as a command-line application: cargo install rustlr

  2. Create a Cargo crate for the sample and cargo add rustlr --no-default-features in the crate. Turning off default features will include only the runtime parsing routines.

  3. save the grammar in the crate as simplecalc.grammar. The filename determines the names of the modules created, and must have a .grammar suffix.

  4. Run the rustlr application in the crate with

rustlr simplecalc.grammar -o src/main.rs

  1. cargo run

The expected output is

AST: ExpList([BinaryOp("+", Int(10), BinaryOp("*", Neg(Int(2)), Int(4))), BinaryOp("-", Int(9), BinaryOp("-", Int(4), Int(1)))])

Rustlr can also be invoked from within Rust by calling the rustlr::generate function.


New in Version 0.6.1, 0.6.2

The -table option stores the parse table to a binary file instead of an inlined static array. Windows pathname compatibility issues fixed in 0.6.2.

New in Version 0.6.0

Semantic actions can now control the behavior of the lexical tokenizer.

New in Version 0.5.0

The option to install only the runtime parser, without parser generation routines

New in Versions 0.4.13

Boxed labels such as [x] are now represented by LC instead of LBox during auto-generation.

New in Versions 0.4.11 and 0.4.12

The wildcard _ token now carries the original text of the token as its semantic value by default. The variant-group directive is now deprecated (though still available) by variant-group-for.

New in Version 0.4.10:

When called from the rustlr::generate function, rustlr can be made completely silent if given the -trace 0 option. All reports are logged and returned by the function.

New in Version 0.4.9: Error logging option

Given a parser instance parser, it's now possible to call parser1.set_err_report(true), which will log parse errors internally instead of printing them to stderr. The error report can be retrieved by calling parser1.get_err_report().

New in Version 0.4.8: Conversion From Yacc/Bison Grammar.

If the rustlr executable is given a file path that ends in ".y", it will attempt to convert a yacc/bison style grammar into rustlr's own grammar syntax, stripping away all semantic actions and other language-specific content. All other command-line options are ignored.


Please consult the tutorial for further documentation.

Dependencies

~2.2–3.5MB
~55K SLoC