4 releases
0.2.6 | Sep 22, 2024 |
---|---|
0.2.2 | Aug 31, 2024 |
0.2.1 | Aug 21, 2024 |
0.2.0 | Aug 16, 2024 |
#928 in Rust patterns
31KB
529 lines
Peggen
A parser generator for parsing expression grammar (PEG) that use inline macros to specify PEG operations.
How is it different from (...)?
/ | Conceptual | User Experience | Performance | Error Handling |
---|---|---|---|---|
PEST | PEST only annotates text. Peggen generates AST directly from your text. |
In most cases, you still want rust enum s for your AST, which is directly provided by Peggen, but you have to manually create enums from PEST rules. |
PEST use an optimizer to memorize your grammar rules, and use memorization for better performance; Peggen doesn't use memorization, arguably this gives better performance over memorization for most grammars. | / |
Chumsky | Chumsky provides parser combinators. Peggen is a parser generator. | Both Chumsky and Peggen provides ast directly. However, Peggen supports arena allocation. | Chumsky deallocates successful sub-rules when a rule fails; Peggen uses a internal representation to eliminate deallocation. | / |
LALRPOP | Peggen is PEG-based; LALRPOP uses LR(1) grammar. | Peggen is more intuitive to use than LALRPOP; LR(1) grammar is hard to extend and debug. | LALRPOP has better performance over Peggen. | LR(1) grammar can report errors far away from normally percepted cause; Peggen allows you to capture errors from customary cause. |
Performance
I roughly tested the peggen on a sample json file against chumsky.
CPU Model: Intel(R) Core(TM) i7-14700HX
Suprisingly, Peggen is faster than Chumsky.
Here are some numbers:
- Peggen : 867913 ns/iter
- Chumsky: 1555256 ns/iter
Example: Json Parser
You can write a json parser in the following several lines:
#[derive(Debug, ParseImpl, Space, Num, EnumAstImpl)]
pub enum Json {
#[rule(r"null")]
Null,
#[rule(r"{0:`false|true`}")]
Bool(bool),
#[rule(r"{0:`-?(0|[1-9][0-9]*)\.([0-9]+)`}")]
Flt(f32),
#[rule("{0:`0|-?[1-9][0-9]*`}")]
Num(i32),
#[rule(r#""{0:`[^"]*`}""#)]
Str(String),
#[rule(r#"\{ [*0: "{0:`[^"]*`}" : {1} , ][?0: "{0:`[^"]*`}" : {1} ] \}"#)]
Obj(RVec<(String, Json)>),
#[rule(r"\[ [*0: {0} , ][?0: {0} ] \]")]
Arr(RVec<Json>)
}
Roadmap
- Optimizations:
- Rule dispatch: filter rules by the first symbol, instead of trying each of them.
- Thinner tag: currently each tag in internal representation is 3-pointers wide, I want to make them thinner.
- Error Handling:
- Custom error handlers when error handlers fail.
Dependencies
~3–13MB
~126K SLoC