14 releases
0.7.0 | Feb 11, 2024 |
---|---|
0.6.0 | Nov 28, 2022 |
0.5.1 |
|
0.2.0 | Jul 17, 2022 |
#45 in Parser tooling
136 downloads per month
Used in 24 crates
(17 directly)
43KB
654 lines
Peginator
Peginator is a PEG (Parsing Expression Grammar) parser generator written in Rust. It is specifically made to parse into ASTs (Abstract Syntax Trees), as opposed to most, streaming-style parsers out there.
It generates both the tree structure and the parsing code that can create that tree from
a &str
. The generated parsing code is deliberately very simple straightforward Rust
code, which is usually optimized very well by the compiler.
There is an opt-in memoization feature that makes it a proper packrat parser that can parse any input in linear time and space.
Left-recursion is also supported using said memoization feature (also opt-in).
Documentation
This documentation describes how peginator implements PEGs. A basic understanding of PEGs are assumed. There are good introductions on wikipedia or in the docs of other parser generators.
Peginator is bootstrapped using its own syntax and grammar file, which is somewhat easy-to-read.
Please see the syntax reference and the API documentation
The tests can also be used as examples.
Quick Start
The grammars for peginator are written in a syntax similar to EBNF (extended Backus-Naur form):
@export
FunctionDef = 'fn' name:Ident '(' param_list:ParamList ')' [ '->' return_value:Type ];
ParamList = self_param:SelfParam {',' params:Param} | params:Param {',' params:Param} | ;
Param = name:Ident ':' typ: Type;
SelfParam = [ref_type:ReferenceMarker] 'self';
Type = [ref_type:ReferenceMarker] typename:Ident;
ReferenceMarker = @:MutableReference | @:ImmutableReference;
ImmutableReference = '&';
MutableReference = '&' 'mut';
@string
@no_skip_ws
Ident = {'a'..'z' | 'A'..'Z' | '_' | '0'..'9'};
Based on the above grammar, peginator will generate the following types:
pub struct FunctionDef {
pub name: Ident,
pub param_list: ParamList,
pub return_value: Option<Type>,
}
pub struct ParamList {
pub self_param: Option<SelfParam>,
pub params: Vec<Param>,
}
pub struct Param {
pub name: Ident,
pub typ: Type,
}
pub struct SelfParam {
pub ref_type: Option<ReferenceMarker>,
}
pub struct Type {
pub ref_type: Option<ReferenceMarker>,
pub typename: Ident,
}
pub enum ReferenceMarker {
ImmutableReference(ImmutableReference),
MutableReference(MutableReference),
}
pub struct ImmutableReference;
pub struct MutableReference;
pub type Ident = String;
impl PegParser for FunctionDef { /* omitted */ }
Parsing then looks like this:
FunctionDef::parse("fn example(&self, input:&str, rectified:&mut Rect) -> ExampleResult;")
Which results in the following structure:
FunctionDef {
name: "example",
param_list: ParamList {
self_param: Some(SelfParam {
ref_type: Some(ImmutableReference(ImmutableReference)),
}),
params: [
Param {
name: "input",
typ: Type {
ref_type: Some(ImmutableReference(ImmutableReference)),
typename: "str",
},
},
Param {
name: "rectified",
typ: Type {
ref_type: Some(MutableReference(MutableReference)),
typename: "Rect",
},
},
],
},
return_value: Some(Type {
ref_type: None,
typename: "ExampleResult",
}),
}
Debugging
We have pretty errors, based on the first failure of the longest match (a'la python's parser):
And parse tracing (opt-in, no cost if not used):
Integration
There are multiple ways to integrate a Peginator grammar to your project:
- Compile your grammars directly with the
peginator-cli
binary - Inline your grammars with the
peginate!
macro from the peginator_macro package - Or you can use the buildscript helper
Railroad graphs
The Peginator CLI can also create a nice railroad graph of your grammar:
Contribution
At this point, I'd be happy if simply more people used this code. Please reach out if you need any help.
See also
The project is meant to be an almost drop-in replacement for Tatsu, and its fantastic Model Builder. This is why the grammar looks like the way it does.
There are a ton of other PEG parser implementations in Rust, please check them out. Non-exhaustive list in no particular order:
Special mention: lalrpop
License
Licensed under the MIT license
Dependencies
~0.1–6MB
~35K SLoC