#lr #glr #compiler #context-free-grammar #bison #error-message #parser

rusty_lr

GLR, LR(1) and LALR(1) parser generator with custom reduce action

67 releases (38 stable)

new 3.2.0 Sep 7, 2024
2.7.2 Sep 2, 2024
1.6.1 Aug 13, 2024
0.12.3 Aug 7, 2024
0.5.2 Jul 29, 2024

#124 in Programming languages

Download history 169/week @ 2024-07-13 742/week @ 2024-07-20 642/week @ 2024-07-27 2134/week @ 2024-08-03 1232/week @ 2024-08-10 1105/week @ 2024-08-17 1267/week @ 2024-08-24 961/week @ 2024-08-31

4,746 downloads per month

MIT/Apache

165KB
3K SLoC

rusty_lr

crates.io docs.rs

GLR, LR(1) and LALR(1) parser generator for Rust.

Cargo Features

  • build : Enable buildscript tools.
  • fxhash : In parser table, replace std::collections::HashMap with FxHashMap from rustc-hash.
  • tree : Enable automatic syntax tree construction. This feature should be used on debug purpose only, since it will consume much more memory and time.
  • error : Enable detailed parsing error messages, for Display and Debug trait. This feature should be used on debug purpose only, since it will consume much more memory and time.

Features

  • GLR, LR(1) and LALR(1) parser generator
  • Provides procedural macros and buildscript tools
  • readable error messages, both for parsing and building grammar
  • pretty-printed syntax tree
  • compile-time DFA construction
  • customizable reduce action
  • resolving conflicts of ambiguous grammar
  • regex patterns partially supported

Example

// this define `EParser` struct
// where `E` is the start symbol
lr1! {
    %userdata i32;           // userdata type
    %tokentype char;         // token type
    %start E;                // start symbol
    %eof '\0';               // eof token

    // token definition
    %token zero '0';
    %token one '1';
    %token two '2';
    %token three '3';
    %token four '4';
    %token five '5';
    %token six '6';
    %token seven '7';
    %token eight '8';
    %token nine '9';
    %token plus '+';
    %token star '*';
    %token lparen '(';
    %token rparen ')';
    %token space ' ';

    // conflict resolving
    %left [plus star];                  // reduce first for token 'plus', 'star'

    // context-free grammars
    Digit(char): [zero-nine];           // character set '0' to '9'

    Number(i32)                         // type assigned to production rule `Number`
        : space* Digit+ space*          // regex pattern
    { Digit.into_iter().collect::<String>().parse().unwrap() };
    //    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this will be the value of `Number`
                                        // reduce action written in Rust code

    A(f32): A plus a2=A {
        *data += 1;                     // access userdata by `data`
        println!( "{:?} {:?} {:?}", A, plus, a2 );
        A + a2
    }
        | M
        ;

    M(f32): M star m2=M { M * m2 }
        | P
        ;

    P(f32): Number { Number as f32 }
        | space* lparen E rparen space* { E }
        ;

    E(f32) : A ;
}
let parser = EParser::new();         // generate `EParser`
let mut context = EContext::new();   // create context
let mut userdata: i32 = 0;           // define userdata

let input_sequence = "1 + 2 * ( 3 + 4 )";

// start feeding tokens
for token in input_sequence.chars() {
    match context.feed(&parser, token, &mut userdata) {
        //                      ^^^^^   ^^^^^^^^^^^^ userdata passed here as `&mut i32`
        //                     feed token
        Ok(_) => {}
        Err(e) => {
            match e {
                EParseError::InvalidTerminal(invalid_terminal) => {
                    ...
                }
                EParseError::ReduceAction(error_from_reduce_action) => {
                    ...
                }
            }
            println!("{}", e);
            // println!( "{}", e.long_message( &parser, &context ) );
            return;
        }
    }
}
context.feed(&parser, '\0', &mut userdata).unwrap();    // feed `eof` token

let res = context.accept();   // get the value of start symbol
println!("{}", res);
println!("userdata: {}", userdata);

Readable error messages (with codespan)

images/error1.png images/error2.png

  • This error message is generated by the buildscript tool, not the procedural macros.

Visualized syntax tree

images/tree.png

  • With tree feature enabled.

detailed ParseError message

images/parse_error.png

  • With error feature enabled.

Syntax

See SYNTAX.md for details of grammar-definition syntax.

Executable

An executable version of buildscript tool is available rustylr.

cargo install rustylr

$ rustylr --help

Contribution

  • Any contribution is welcome.
  • Please feel free to open an issue or pull request.

License (Since 2.8.0)

Either of

Dependencies

~0–7MB
~36K SLoC