#parsing #parser #grammar #ast #peg

peginator

PEG parser generator for creating ASTs in Rust (runtime)

13 releases (4 breaking)

Uses new Rust 2021

0.6.0 Nov 28, 2022
0.5.1 Nov 28, 2022
0.4.0 Nov 21, 2022
0.3.0 Oct 4, 2022
0.1.2 May 18, 2022

#27 in Parser tooling

Download history 9/week @ 2022-08-19 7/week @ 2022-08-26 8/week @ 2022-09-02 12/week @ 2022-09-09 7/week @ 2022-09-16 112/week @ 2022-09-23 79/week @ 2022-09-30 35/week @ 2022-10-07 35/week @ 2022-10-14 72/week @ 2022-10-21 134/week @ 2022-10-28 107/week @ 2022-11-04 87/week @ 2022-11-11 170/week @ 2022-11-18 174/week @ 2022-11-25 24/week @ 2022-12-02

472 downloads per month
Used in 24 crates (19 directly)

MIT license

41KB
628 lines

Peginator

Peginator is a PEG (Parsing Expression Grammar) parser generator written in Rust. It is specifically made to parse into ASTs (Abstract Syntax Trees), as opposed to most, streaming-style parsers out there.

It generates both the tree structure and the parsing code that can create that tree from a &str. The generated parsing code is deliberately very simple straightforward Rust code, which is usually optimized very well by the compiler.

There is an opt-in memoization feature that makes it a proper packrat parser that can parse any input in linear time and space.

Left-recursion is also supported using said memoization feature (also opt-in).

Documentation

This documentation describes how peginator implements PEGs. A basic understanding of PEGs are assumed. There are good introductions on wikipedia or in the docs of other parser generators.

Peginator is bootstrapped using its own syntax and grammar file, which is somewhat easy-to-read.

Please see the syntax reference and the API documentation

The tests can also be used as examples.

Quick Start

The grammars for peginator are written in a syntax similar to EBNF (extended Backus-Naur form):

@export
FunctionDef = 'fn' name:Ident '(' param_list:ParamList ')' [ '->' return_value:Type ];

ParamList = self_param:SelfParam {',' params:Param} | params:Param  {',' params:Param} | ;

Param = name:Ident ':' typ: Type;

SelfParam = [ref_type:ReferenceMarker] 'self';

Type = [ref_type:ReferenceMarker] typename:Ident;

ReferenceMarker = @:MutableReference | @:ImmutableReference;

ImmutableReference = '&';
MutableReference = '&' 'mut';

@string
@no_skip_ws
Ident = {'a'..'z' | 'A'..'Z' | '_' | '0'..'9'};

Based on the above grammar, peginator will generate the following types:

pub struct FunctionDef {
    pub name: Ident,
    pub param_list: ParamList,
    pub return_value: Option<Type>,
}
pub struct ParamList {
    pub self_param: Option<SelfParam>,
    pub params: Vec<Param>,
}
pub struct Param {
    pub name: Ident,
    pub typ: Type,
}
pub struct SelfParam {
    pub ref_type: Option<ReferenceMarker>,
}
pub struct Type {
    pub ref_type: Option<ReferenceMarker>,
    pub typename: Ident,
}
pub enum ReferenceMarker {
    ImmutableReference(ImmutableReference),
    MutableReference(MutableReference),
}
pub struct ImmutableReference;
pub struct MutableReference;
pub type Ident = String;

impl PegParser for FunctionDef { /* omitted */ }

Parsing then looks like this:

FunctionDef::parse("fn example(&self, input:&str, rectified:&mut Rect) -> ExampleResult;")

Which results in the following structure:

FunctionDef {
    name: "example",
    param_list: ParamList {
        self_param: Some(SelfParam {
            ref_type: Some(ImmutableReference(ImmutableReference)),
        }),
        params: [
            Param {
                name: "input",
                typ: Type {
                    ref_type: Some(ImmutableReference(ImmutableReference)),
                    typename: "str",
                },
            },
            Param {
                name: "rectified",
                typ: Type {
                    ref_type: Some(MutableReference(MutableReference)),
                    typename: "Rect",
                },
            },
        ],
    },
    return_value: Some(Type {
        ref_type: None,
        typename: "ExampleResult",
    }),
}

Debugging

We have pretty errors, based on the first failure of the longest match (a'la python's parser):

Colors and stuff on a console

And parse tracing (opt-in, no cost if not used):

More colors and indentation

Integration

There are multiple ways to integrate a Peginator grammar to your project:

  • Compile your grammars directly with the peginator-cli binary
  • Inline your grammars with the peginate! macro from the peginator_macro package
  • Or you can use the buildscript helper

Contribution

At this point, I'd be happy if simply more people used this code. Please reach out if you need any help.

See also

The project is meant to be an almost drop-in replacement for Tatsu, and its fantastic Model Builder. This is why the grammar looks like the way it does.

There are a ton of other PEG parser implementations in Rust, please check them out. Non-exhaustive list in no particular order:

Special mention: lalrpop

License

Licensed under the MIT license

Dependencies

~125KB