77 releases (48 stable)
new 3.3.7 | Apr 15, 2025 |
---|---|
3.3.1 | Oct 26, 2024 |
2.7.2 | Sep 2, 2024 |
1.6.1 | Aug 13, 2024 |
0.5.2 | Jul 29, 2024 |
#332 in Parser tooling
556 downloads per month
Used in 3 crates
(via lua_parser)
175KB
3.5K
SLoC
rusty_lr
A Yacc-like, procedural macro-based parser generator for Rust supporting LR(1), LALR(1), and GLR parsing strategies.
RustyLR enables you to define context-free grammars (CFGs) directly in Rust using macros or build scripts. It constructs deterministic finite automata (DFA) at compile time, ensuring efficient and reliable parsing.
Please refer to docs.rs for detailed example and documentation.
Features
- Multiple Parsing Strategies: Supports LR(1), LALR(1), and GLR parsers.
- Procedural Macros: Define grammars using lr1! and lalr1! macros for compile-time parser generation.
- Build Script Integration: Generate parsers via build scripts for complex grammars with detailed error messages.
- Custom Reduce Actions: Define custom actions during reductions to build ASTs or perform computations.
- Grammar Conflict Detection: Automatically detects shift/reduce and reduce/reduce conflicts during parser generation, providing informative diagnostics to help resolve ambiguities.
Installation
Add RustyLR to your Cargo.toml
:
[dependencies]
rusty_lr = "..."
To use buildscript tools:
[build-dependencies]
rusty_lr = { version = "...", features = ["build"] }
Or you want to use executable version (optional):
cargo install rustylr
Quick Start
Using Procedural Macros
Define your grammar using the lr1!
or lalr1!
macro:
// this define `EParser` struct
// where `E` is the start symbol
lr1! {
%userdata i32; // userdata type passed to parser
%tokentype char; // token type; sequence of `tokentype` is fed to parser
%start E; // start symbol; this is the final value of parser
%eof '\0'; // eof token; this token is used to finish parsing
// ================= Token definitions =================
%token zero '0';
%token one '1';
...
%token nine '9';
%token plus '+';
%token star '*';
%token space ' ';
%left [plus star]; // reduce-first for token 'plus', 'star'
// ================= Production rules =================
Digit(char): [zero-nine]; // character set '0' to '9'
Number(i32) // production rule `Number` holds `i32` value
: space* Digit+ space* // `Number` is one or more `Digit` surrounded by zero or more spaces
{ Digit.into_iter().collect::<String>().parse().unwrap() }; // this will be the value of `Number` (i32) by this production rule
A(f32)
: A plus a2=A {
*data += 1; // access userdata by `data`
println!( "{:?} {:?} {:?}", A, plus, a2 ); // any Rust code can be written here
A + a2 // this will be the value of `A` (f32) by this production rule
}
| M
;
M(f32): M star m2=M { M * m2 }
| Number { Number as f32 } // Number is `i32`, so cast to `f32`
;
E(f32) : A ; // start symbol
}
This defines a simple arithmetic expression parser.
Using Build Script
For complex grammars, you can use a build script to generate the parser. This will provide more detailed error messages when conflicts occur.
1. Create a grammar file (e.g., src/parser.rs
) with the following content:
// Rust code of `use` and type definitions
%% // start of grammar definition
%tokentype u8;
%start E;
%eof b'\0';
%token a b'a';
%token lparen b'(';
%token rparen b')';
E: lparen E rparen
| a;
...
2. Setup build.rs
:
// build.rs
use rusty_lr::build;
fn main() {
println!("cargo::rerun-if-changed=src/parser.rs");
let output = format!("{}/parser.rs", std::env::var("OUT_DIR").unwrap());
build::Builder::new()
.file("src/parser.rs") // path to the input file
.build(&output); // path to the output file
}
3. Include the generated source code:
include!(concat!(env!("OUT_DIR"), "/parser.rs"));
4. Use the parser in your code:
let mut parser = parser::EParser::new(); // create <StartSymbol>Parser class
let mut context = parser::EContext::new(); // create <StartSymbol>Context class
let mut userdata: i32 = 0;
for b in input.chars() {
match context.feed(&parser, b, &mut userdata) {
Ok(_) => {}
Err(e) => {
eprintln!("error: {}", e);
return;
}
}
}
println!("{:?}", context);
context.feed(&parser, 0 as char, &mut userdata).unwrap(); // feed EOF
let result:i32 = context.accept(); // get value of start 'E'
GLR Parsing
RustyLR offers built-in support for Generalized LR (GLR) parsing, enabling it to handle ambiguous or nondeterministic grammars that traditional LR(1) or LALR(1) parsers cannot process. See GLR.md for details.
Examples
- Calculator: A calculator using
u8
as token type. - lua 5.4 syntax parser
- Bootstrap: rusty_lr syntax parser is written in rusty_lr itself.
Cargo Features
build
: Enable build script tools.fxhash
: Use FXHashMap instead ofstd::collections::HashMap
for parser tables.tree
: Enable automatic syntax tree construction (For debugging purposes).error
: Enable detailed parsing error messages (For debugging purposes).
Syntax
RustyLR's grammar syntax is inspired by traditional Yacc/Bison formats. See SYNTAX.md for details of grammar-definition syntax.
Contribution
- Any contribution is welcome.
- Please feel free to open an issue or pull request.
License (Since 2.8.0)
Either of
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
Images
It is highly recommended to use buildscipt tools or executable instead of procedural macros, to generate readable error messages.
-Reduce/Reduce conflicts
- Shift/Reduce conflicts
Dependencies
~0–6.5MB
~36K SLoC