2 releases (1 stable)

1.0.0	Sep 27, 2020
0.1.0	May 23, 2020

#1583 in Parser implementations

GPL-3.0 license

130KB
2K SLoC

dpr

Evolution of... dynparser

Basic execution flow

  Text -> Parsing -> Transform -> Text

More info about the peg syntax bellow.

Usage

Add to cargo.toml

[dependencies]
dpr = "0.1.0"
# dpr = {git = "https://github.com/jleahred/dpr" }

Wach examples below

Modifications

    0.1.0 First version

TODO

Adding external functions
don't needed to be multiexpr pub(crate)struct Transf2Expr { pub(crate)mexpr: MultiExpr,
remove and or multiexpr when only one option (and/or)

About

Giveng a peg grammar extended, it will verify the input and can generate an output based on transformation rules

But let's see by examples

Simple example

Starting with this peg

Peg:

        main    =   char+
        char    =   'a'     -> A
                /   'b'     -> B
                /   .

Given this input

Input:

aaacbbabdef

We got as result:

Output:

AAAcBBABdef

Addition calculator example

Peg:

        main    =   expr

        expr    =   num:num                -> PUSH $(num)$(:endl)
                    (op:op  expr:expr)?    -> $(expr)EXEC $(op)$(:endl)

        op      =   '+'     -> ADD
                /   '-'     -> SUB

        num     =   [0-9]+  ('.' [0-9])?

Input:

    1+2-3

Output:

    PUSH 1
    PUSH 2
    PUSH 3
    EXEC SUB
    EXEC ADD

Execution flow

Basic text trasnformation flow.


   DSL flow


 .--------.
 |  peg   |
 |  user  |
 '--------'
      |
      v
 .--------.
 |  GEN   |
 | rules  |
 '--------'
      |                .----------.
      |                |  input   |
      |                |   user   |
      |                '----------'
      |                      |
      |                      v
      |                .----------.
      |                |  parse   |
      '--------------->|          |
                       '----------'
                             |
                             v
                        .---------.
                        | replace |
                        |         |
                        '---------'
                             |
                             v
                        .--------.
                        | OUTPUT |
                        |        |
                        '--------'

The rust code for first example...

extern crate dpr;

fn main() -> Result<(), dpr::Error> {
    let result = dpr::Peg::new(
        "
        main    =   char+
        char    =   'a'     -> A
                /   'b'     -> B
                /   .
    ",
    )
    .gen_rules()?
    .parse("aaacbbabdef")?
    .replace()?
    //  ...
    ;

    println!("{:#?}", result);
    Ok(())
}

PEG rules grammar

You saw some examples, let see in detail

token	Description
`=`	On left, symbol, on right expresion defining symbol
`symbol`	It's an string without quotes, no spaces, and ascii
`.`	Any char
`"..."`	Literal delimited by quotes
`<space>`	Separate tokens and Rule concatenation (`and` operation)
`/`	Or operation
`(...)`	A expression composed of sub expresions
`?`	One optional
`*`	Repeat 0 or more
`+`	Repeat 1 or more
`!`	negate expression, continue if not followed without consume
`&`	verify it follows..., but not consuming
`[...]`	Match chars. It's a list or ranges (or both)
`->`	after the arrow, we have the transformation rule
`:`	To give a name, in order to use later in transformation
error(...)	This let's you to define an error message when this rule is satisfied

Below there is the grammar witch define the valid peg inputs. BTW, this grammar has been parsed to generate the code to parse itself ;-)

Let's see by example

Rules by example

A simple literal string.

main = "Hello world"

Concatenation (and)

main = "Hello "  "world"

Referencing symbols

Symbol

main = hi
hi   = "Hello world"

Or conditions /

main = "hello" / "hi"

Or multiline

main
    = "hello"
    / "hi"
    / "hola"

Or multiline 2

main = "hello"
     / "hi"
     / "hola"

Or disorganized

main = "hello"
     / "hi" / "hola"

Parenthesis

main = ("hello" / "hi")  " world"

Just multiline

Multiline1

main
    = ("hello" / "hi")  " world"

Multiline2

main
    = ("hello" / "hi")
    " world"

Multiline3

main = ("hello" / "hi")
     " world"

It is recomended to use or operator / on each new line and = on first line, like

Multiline organized

main = ("hello" / "hi")  " world"
     / "bye"

One optional

main = ("hello" / "hi")  " world"?

Repetitions

main         = one_or_more_a / zero_or_many_b
one_or_more  = "a"+
zero_or_many = "b"*

Negation will not move current possition

Next example will consume all chars till get an "a"

Negation

main = (!"a" .)* "a"

Consume till

comment = "//" (!"\n" .)*
        / "/*" (!"*/" .)* "*/"

Match a set of chars. Chars can be defined by range.

number  = digit+ ("." digit+)?
digit   = [0-9]
a_or_b  = [ab]
id      = [_a-zA-Z][_a-zA-Z0-9]*

a_or_b_or_digit  = [ab0-9]

Simple recursion

one or more "a" recursive

as  = "a" as
    / "a"

//  simplified with `+`
ak = "a"+

Recursion to match parentheses

Recursion match par

match_par = "(" match_par ")"
          / "(" ")"

In order to produce custom errors, you have to use error(...) constructor

In next example, the system will complain with parenthesis error if they are unbalanced

    parenth         =   '('  _  expr  _  (  ')'
                                         /  error("unbalanced parethesis: missing ')'")
                                         )

As you can see, if you can run the rule to close properly the parenthesis, everything is OK, in other case, custom error message will be produced

Replacing

You can set the replace rules with ->

        op      =   '+'     -> ADD
                /   '-'     -> SUB

When + will be found and validated, it will be replaced by ADD

        expr    =   num:num                -> PUSH $(num)$(:endl)
                    (op:op  expr:expr)?    -> $(expr)EXEC $(op)$(:endl)

To refer to parsed chunk, you can name it using :

When refering to a symbol, you don't need to give a name

Next examples, are equivalent

        expr    =   num:num                -> PUSH $(num)$(:endl)
                    (op:op  expr:expr)?    -> $(expr)EXEC $(op)$(:endl)

        expr    =   num            -> PUSH $(num)$(:endl)
                    (op  expr)?    -> $(expr)EXEC $(op)$(:endl)

The arrow will work with current line. If you need to use trasnsformations over some lines, you will have to use (...)

There is a grammar to parse the peg grammars that could be an example on file gcode/peg2code.rs

After the arrow, you will have the transformation rule.

Replacing tokens: Things inside $(...) will be replaced. Text outside it, will be written as it

Replacing tokens can refer to parsed text by name or by position.

           -> $(num)

This will look for a name called num defined on left side to write it on output

Next line will also look for names, but on rep_symbol will not complain it it doesn't exists

    rep_or_unary    =   atom_or_par  rep_symbol?    -> $(?rep_symbol)$(atom_or_par)

You can also refer an element by position

           -> $(.1)

You can also refer to functions starting the replacing token with :

        expr    =   num            -> $(:endl)

Predefined functions are...

(Watch on replace.rs to see full replace functions)

        "endl" => "\n",
        "spc" => " ",
        "_" => " ",
        "tab" => "\t",
        "(" => "\t",
        // "now" => "pending",
        _ => "?unknown_fn?",

Example

        expr    =   num            -> PUSH $(num)$(:endl)
                    (op  expr)?    -> $(.2)EXEC $(.1)$(:endl)

You can define your own functions (aka external functions)

In next example we created the replacement token el

fn main() -> Result<(), dpr::Error> {
    let result = dpr::Peg::new(
        "
        main    =   char+
        char    =   'a'     -> $(:el)A
                /   'b'     -> $(:el)B
                /   ch:.    -> $(:el)$(ch)
    ",
    )
    .gen_rules()?
    .parse("aaacbbabdef")?
    .replace(Some(&dpr::FnCallBack(custom_funtions)))?
    //  ...
    ;

    println!("{:#?}", result);
    println!("{}", result.str());
    Ok(())
}

fn custom_funtions(fn_txt: &str) -> Option<String> {
    match fn_txt {
        "el" => Some("\n".to_string()),
        _ => None,
    }
}

Full math expresion compiler example

What is a parser without an math expresion calculator?

Obiously, it's necessary to consider the operator priority, operator asociativity and parenthesis, and negative numbers and negative expresions

extern crate dpr;

fn main() -> Result<(), dpr::Error> {
    let result = dpr::Peg::new(
        r#"
        main    =   expr

        expr    =   term    (
                            _  add_op   _  term     ->$(term)$(add_op)
                            )*

        term    =   factor  (
                            _  mult_op  _  factor   ->$(factor)$(mult_op)
                            )*

        factor  =   pow     (
                            _  pow_op   _  subexpr  ->$(subexpr)$(pow_op)
                            )*

        pow     =   subexpr (
                            _  pow_op   _  pow  ->$(pow)$(pow_op)
                            )*

        subexpr =   '(' _ expr _                    ->$(expr)
                                (  ')'              ->$(:none)
                                /  error("parenthesis error")
                        )
                /   number                        ->PUSH $(number)$(:endl)
                /   '-' _ subexpr                 ->PUSH 0$(:endl)$(subexpr)SUB$(:endl)

        number  =   ([0-9]+  ('.' [0-9])?)

        add_op  =   '+'     ->EXEC ADD$(:endl)
                /   '-'     ->EXEC SUB$(:endl)

        mult_op =   '*'     ->EXEC MUL$(:endl)
                /   '/'     ->EXEC DIV$(:endl)

        pow_op  =   '^'     ->EXEC POW$(:endl)

        _       = ' '*
        "#,
    )
    .gen_rules()?
    .parse("-(-1+2* 3^5 ^(- 2 ) -7)+8")?
    .replace()?
    //  ...
    ;

    println!("{:#?}", result);
    println!("{}", result.str());
    Ok(())
}

The output is a program for a stack machine, composed of a command with a parameter...

PUSH 0
PUSH 0
PUSH 1
EXEC SUB
PUSH 2
PUSH 3
PUSH 5
PUSH 0
PUSH 2
EXEC SUB
EXEC POW
EXEC POW
EXEC MUL
EXEC ADD
PUSH 7
EXEC SUB
EXEC SUB
PUSH 8
EXEC ADD

Full peg grammar doc spec

At the moment it's...

(for an updated reference, open peg2code.rs file :-)

fn text_peg2code() -> &'static str {
    r#"
    /*      A peg grammar to parse peg grammars
     *
     */

    main            =   grammar                                     -> $(grammar)EOP
    grammar         =   rule+

    symbol          =   [_a-zA-Z0-9] [_'"a-zA-Z0-9]*

    rule            =   _  rule_name  _  '='  _  expr  _eol _       -> RULE$(:endl)$(rule_name)$(:endl)$(expr)

    rule_name       =   symbol

    expr            =   or                              -> OR$(:endl)$(or)CLOSE_MEXPR$(:endl)

    or              =   _  and                          -> AND$(:endl)$(and)CLOSE_MEXPR$(:endl)
                        ( _  '/'  _  or )?              -> $(or)

    and             =   error
                    /   (andline  transf2   and:(
                            _                ->$(:none)
                            !(rule_name   _  ('=' / '{'))   and )?)             -> TRANSF2$(:endl)$(transf2)EOTRANSF2$(:endl)AND$(:endl)$(andline)CLOSE_MEXPR$(:endl)$(and)
                    /   andline     (  
                                       ( ' ' / comment )*   eol+   _            -> $(:none)
                                       !( rule_name _   ('=' / '{') )   and 
                                    )?

    error           =   'error' _  '('  _  literal  _  ')'      -> ERROR$(:endl)$(literal)$(:endl)


    andline         =   andchunk  (     
                                        ' '+  ->$(:none)
                                        ( error / andchunk )
                                  )*

    andchunk        =   name   e:rep_or_unary                 -> NAMED$(:endl)$(name)$(:endl)$(e)
                    /            rep_or_unary
                        

    //  this is the and separator
    _1              =   ' ' / eol                   -> $(:none)

    //  repetitions or unary operator
    rep_or_unary    =   atom_or_par  rep_symbol?    -> $(?rep_symbol)$(atom_or_par)
                    //   atom_or_par                -> $(atom_or_par)
                    /   '!' atom_or_par             -> NEGATE$(:endl)$(atom_or_par)
                    /   '&' atom_or_par             -> PEEK$(:endl)$(atom_or_par)

    rep_symbol      =   '*'     -> REPEAT$(:endl)0$(:endl)inf$(:endl)
                    /   '+'     -> REPEAT$(:endl)1$(:endl)inf$(:endl)
                    /   '?'     -> REPEAT$(:endl)0$(:endl)1$(:endl)

    atom_or_par     =   atom / parenth

    parenth         =   '('  _  expr  _                 -> $(expr)
                                         (  ')'         -> $(:none)
                                         /  error("unbalanced parethesis: missing ')'")
                                         )

    atom            =   a:literal             -> ATOM$(:endl)LIT$(:endl)$(a)$(:endl)
                    /   a:match               -> MATCH$(:endl)$(a)
                    /   a:rule_name           -> ATOM$(:endl)RULREF$(:endl)$(a)$(:endl)
                    /     dot                 -> ATOM$(:endl)DOT$(:endl)
                                    //  as rule_name can start with a '.', dot has to be after rule_name

    literal         =  lit_noesc  /  lit_esc

    lit_noesc       =  _'   l:(  !_' .  )*   _'        -> $(l)

    _'              =   "'"

    lit_esc         =   (_"
                            l:(   esc_char
                              /   hex_char
                              /   !_" .
                              )*
                        _")                             -> $(l)

    _"              =   '"'

    esc_char        =   '\r'
                    /   '\n'
                    /   '\t'
                    /   '\\'
                    /   '\\"'

    hex_char        =   '\0x' [0-9A-F] [0-9A-F]

    eol             =   "\r\n"  /  "\n"  /  "\r"
    _eol            =   (' ' / comment)*  eol

    match           =   '['     -> $(:none)
                            (
                                mchars  b:(mbetween*)       -> CHARS$(:endl)$(mchars)$(:endl)BETW$(:endl)$(b)EOBETW$(:endl)
                                / b:(mbetween+)             -> BETW$(:endl)$(b)EOBETW$(:endl)
                            )
                        ']'                -> $(:none)

    mchars          =   (!']' !(. '-') .)+

    mbetween        =   f:.  '-'  s:.                 -> $(f)$(:endl)$(s)$(:endl)

    dot             =   '.'

    _               =   (
                            (  ' '
                            /   eol
                            /   comment
                            )*
                        )                                  -> $(:none)

    comment         =   (   line_comment
                        /   mline_comment
                        )                                  -> $(:none)

    line_comment    =   '//' (!eol .)*

    mline_comment   =   '/*' (!'*/' .)* '*/'

    name            =   symbol ":"                         -> $(symbol)

    transf2         =   _1 _  '->'  ' '*    -> $(:none)
                        transf_rule         -> $(transf_rule)
                        &eol

    transf_rule     =   ( tmpl_text  /  tmpl_rule )+

    tmpl_text       =   t:( (!("$(" / eol) .)+ )                -> TEXT$(:endl)$(t)$(:endl)

    tmpl_rule       =   "$("          -> $(:none)
                            (
                                    //  by name optional
                                  '?'  symbol                   ->NAMED_OPT$(:endl)$(symbol)$(:endl)
                                    //  by name
                                /  symbol                       ->NAMED$(:endl)$(symbol)$(:endl)
                                    //  by pos
                                /   "."  pos:([0-9]+)           ->POS$(:endl)$(symbol)$(pos)$(:endl)
                                    //  by function
                                /   ":"  ->$(:none)
                                      fn:((!(")" / eol) .)+)    ->FUNCT$(:endl)$(fn)$(:endl)
                              )
                        ")"                                     ->$(:none)
    "#
}

Hacking the code

As you can see, the code to start parsing the peg input, is written in a text peg file

How is it possible?

At the moment, the rules_for_pegcode is...

    r#"symbol"# => or!(and!(ematch!(chlist r#"_"#  , from 'a', to 'z' , from 'A', to 'Z' , from '0', to '9' ), rep!(ematch!(chlist r#"_'""#  , from 'a', to 'z' , from 'A', to 'Z' , from '0', to '9' ), 0)))
    , r#"transf_rule"# => or!(and!(rep!(or!(and!(ref_rule!(r#"tmpl_text"#)), and!(ref_rule!(r#"tmpl_rule"#))), 1)))
    , r#"mbetween"# => or!(and!(transf2!( and!( and!(named!("f", dot!()), lit!("-"), named!("s", dot!())) ) , t2rules!(t2_byname!("f"), t2_funct!("endl"), t2_byname!("s"), t2_funct!("endl"), ) )))
    , r#"line_comment"# => or!(and!(lit!("//"), rep!(or!(and!(not!(ref_rule!(r#"eol"#)), dot!())), 0)))
    , r#"lit_noesc"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"_'"#), named!("l", rep!(or!(and!(not!(ref_rule!(r#"_'"#)), dot!())), 0)), ref_rule!(r#"_'"#)) ) , t2rules!(t2_byname!("l"), ) )))
    , r#"error"# => or!(and!(transf2!( and!( and!(lit!("error"), ref_rule!(r#"_"#), lit!("("), ref_rule!(r#"_"#), ref_rule!(r#"literal"#), ref_rule!(r#"_"#), lit!(")")) ) , t2rules!(t2_text!("ERROR"), t2_funct!("endl"), t2_byname!("literal"), t2_funct!("endl"), ) )))
    , r#"atom_or_par"# => or!(and!(ref_rule!(r#"atom"#)), and!(ref_rule!(r#"parenth"#)))
    , r#"tmpl_text"# => or!(and!(transf2!( and!( and!(named!("t", or!(and!(rep!(or!(and!(not!(or!(and!(lit!("$(")), and!(ref_rule!(r#"eol"#)))), dot!())), 1))))) ) , t2rules!(t2_text!("TEXT"), t2_funct!("endl"), t2_byname!("t"), t2_funct!("endl"), ) )))
    , r#"atom"# => or!(and!(transf2!( and!( and!(named!("a", ref_rule!(r#"literal"#))) ) , t2rules!(t2_text!("ATOM"), t2_funct!("endl"), t2_text!("LIT"), t2_funct!("endl"), t2_byname!("a"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(named!("a", ref_rule!(r#"match"#))) ) , t2rules!(t2_text!("MATCH"), t2_funct!("endl"), t2_byname!("a"), ) )), and!(transf2!( and!( and!(named!("a", ref_rule!(r#"rule_name"#))) ) , t2rules!(t2_text!("ATOM"), t2_funct!("endl"), t2_text!("RULREF"), t2_funct!("endl"), t2_byname!("a"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(ref_rule!(r#"dot"#)) ) , t2rules!(t2_text!("ATOM"), t2_funct!("endl"), t2_text!("DOT"), t2_funct!("endl"), ) )))
    , r#"grammar"# => or!(and!(rep!(ref_rule!(r#"rule"#), 1)))
    , r#"dot"# => or!(and!(lit!(".")))
    , r#"eol"# => or!(and!(lit!("\r\n")), and!(lit!("\n")), and!(lit!("\r")))
    , r#"expr"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"or"#)) ) , t2rules!(t2_text!("OR"), t2_funct!("endl"), t2_byname!("or"), t2_text!("CLOSE_MEXPR"), t2_funct!("endl"), ) )))
    , r#"name"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"symbol"#), lit!(":")) ) , t2rules!(t2_byname!("symbol"), ) )))
    , r#"literal"# => or!(and!(ref_rule!(r#"lit_noesc"#)), and!(ref_rule!(r#"lit_esc"#)))
    , r#"rule"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"_"#), ref_rule!(r#"rule_name"#), ref_rule!(r#"_"#), lit!("="), ref_rule!(r#"_"#), ref_rule!(r#"expr"#), ref_rule!(r#"_eol"#), ref_rule!(r#"_"#)) ) , t2rules!(t2_text!("RULE"), t2_funct!("endl"), t2_byname!("rule_name"), t2_funct!("endl"), t2_byname!("expr"), ) )))
    , r#"andchunk"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"name"#), named!("e", ref_rule!(r#"rep_or_unary"#))) ) , t2rules!(t2_text!("NAMED"), t2_funct!("endl"), t2_byname!("name"), t2_funct!("endl"), t2_byname!("e"), ) )), and!(ref_rule!(r#"rep_or_unary"#)))
    , r#"_""# => or!(and!(lit!("\"")))
    , r#"mline_comment"# => or!(and!(lit!("/*"), rep!(or!(and!(not!(lit!("*/")), dot!())), 0), lit!("*/")))
    , r#"andline"# => or!(and!(ref_rule!(r#"andchunk"#), rep!(or!(and!(transf2!( and!( and!(rep!(lit!(" "), 1)) ) , t2rules!(t2_funct!("none"), ) ), or!(and!(ref_rule!(r#"error"#)), and!(ref_rule!(r#"andchunk"#))))), 0)))
    , r#"hex_char"# => or!(and!(lit!("\0x"), ematch!(chlist r#""#  , from '0', to '9' , from 'A', to 'F' ), ematch!(chlist r#""#  , from '0', to '9' , from 'A', to 'F' )))
    , r#"mchars"# => or!(and!(rep!(or!(and!(not!(lit!("]")), not!(or!(and!(dot!(), lit!("-")))), dot!())), 1)))
    , r#"transf2"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"_1"#), ref_rule!(r#"_"#), lit!("->"), rep!(lit!(" "), 0)) ) , t2rules!(t2_funct!("none"), ) ), transf2!( and!( and!(ref_rule!(r#"transf_rule"#)) ) , t2rules!(t2_byname!("transf_rule"), ) ), peek!(ref_rule!(r#"eol"#))))
    , r#"_'"# => or!(and!(lit!("'")))
    , r#"and"# => or!(and!(ref_rule!(r#"error"#)), and!(transf2!( and!( and!(or!(and!(ref_rule!(r#"andline"#), ref_rule!(r#"transf2"#), named!("and", rep!(or!(and!(transf2!( and!( and!(ref_rule!(r#"_"#)) ) , t2rules!(t2_funct!("none"), ) ), not!(or!(and!(ref_rule!(r#"rule_name"#), ref_rule!(r#"_"#), or!(and!(lit!("=")), and!(lit!("{")))))), ref_rule!(r#"and"#))), 0, 1))))) ) , t2rules!(t2_text!("TRANSF2"), t2_funct!("endl"), t2_byname!("transf2"), t2_text!("EOTRANSF2"), t2_funct!("endl"), t2_text!("AND"), t2_funct!("endl"), t2_byname!("andline"), t2_text!("CLOSE_MEXPR"), t2_funct!("endl"), t2_byname!("and"), ) )), and!(ref_rule!(r#"andline"#), rep!(or!(and!(transf2!( and!( and!(rep!(or!(and!(lit!(" ")), and!(ref_rule!(r#"comment"#))), 0), rep!(ref_rule!(r#"eol"#), 1), ref_rule!(r#"_"#)) ) , t2rules!(t2_funct!("none"), ) ), not!(or!(and!(ref_rule!(r#"rule_name"#), ref_rule!(r#"_"#), or!(and!(lit!("=")), and!(lit!("{")))))), ref_rule!(r#"and"#))), 0, 1)))
    , r#"_"# => or!(and!(transf2!( and!( and!(or!(and!(rep!(or!(and!(lit!(" ")), and!(ref_rule!(r#"eol"#)), and!(ref_rule!(r#"comment"#))), 0)))) ) , t2rules!(t2_funct!("none"), ) )))
    , r#"or"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"_"#), ref_rule!(r#"and"#)) ) , t2rules!(t2_text!("AND"), t2_funct!("endl"), t2_byname!("and"), t2_text!("CLOSE_MEXPR"), t2_funct!("endl"), ) ), transf2!( and!( and!(rep!(or!(and!(ref_rule!(r#"_"#), lit!("/"), ref_rule!(r#"_"#), ref_rule!(r#"or"#))), 0, 1)) ) , t2rules!(t2_byname!("or"), ) )))
    , r#"_1"# => or!(and!(lit!(" ")), and!(transf2!( and!( and!(ref_rule!(r#"eol"#)) ) , t2rules!(t2_funct!("none"), ) )))
    , r#"lit_esc"# => or!(and!(transf2!( and!( and!(or!(and!(ref_rule!(r#"_""#), named!("l", rep!(or!(and!(ref_rule!(r#"esc_char"#)), and!(ref_rule!(r#"hex_char"#)), and!(not!(ref_rule!(r#"_""#)), dot!())), 0)), ref_rule!(r#"_""#)))) ) , t2rules!(t2_byname!("l"), ) )))
    , r#"rep_symbol"# => or!(and!(transf2!( and!( and!(lit!("*")) ) , t2rules!(t2_text!("REPEAT"), t2_funct!("endl"), t2_text!("0"), t2_funct!("endl"), t2_text!("inf"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(lit!("+")) ) , t2rules!(t2_text!("REPEAT"), t2_funct!("endl"), t2_text!("1"), t2_funct!("endl"), t2_text!("inf"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(lit!("?")) ) , t2rules!(t2_text!("REPEAT"), t2_funct!("endl"), t2_text!("0"), t2_funct!("endl"), t2_text!("1"), t2_funct!("endl"), ) )))
    , r#"parenth"# => or!(and!(transf2!( and!( and!(lit!("("), ref_rule!(r#"_"#), ref_rule!(r#"expr"#), ref_rule!(r#"_"#)) ) , t2rules!(t2_byname!("expr"), ) ), or!(and!(transf2!( and!( and!(lit!(")")) ) , t2rules!(t2_funct!("none"), ) )), and!(error!("unbalanced parethesis: missing ')'")))))
    , r#"comment"# => or!(and!(transf2!( and!( and!(or!(and!(ref_rule!(r#"line_comment"#)), and!(ref_rule!(r#"mline_comment"#)))) ) , t2rules!(t2_funct!("none"), ) )))
    , r#"_eol"# => or!(and!(rep!(or!(and!(lit!(" ")), and!(ref_rule!(r#"comment"#))), 0), ref_rule!(r#"eol"#)))
    , r#"esc_char"# => or!(and!(lit!("\r")), and!(lit!("\n")), and!(lit!("\t")), and!(lit!("\\")), and!(lit!("\\\"")))
    , r#"match"# => or!(and!(transf2!( and!( and!(lit!("[")) ) , t2rules!(t2_funct!("none"), ) ), or!(and!(transf2!( and!( and!(ref_rule!(r#"mchars"#), named!("b", or!(and!(rep!(ref_rule!(r#"mbetween"#), 0))))) ) , t2rules!(t2_text!("CHARS"), t2_funct!("endl"), t2_byname!("mchars"), t2_funct!("endl"), t2_text!("BETW"), t2_funct!("endl"), t2_byname!("b"), t2_text!("EOBETW"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(named!("b", or!(and!(rep!(ref_rule!(r#"mbetween"#), 1))))) ) , t2rules!(t2_text!("BETW"), t2_funct!("endl"), t2_byname!("b"), t2_text!("EOBETW"), t2_funct!("endl"), ) ))), transf2!( and!( and!(lit!("]")) ) , t2rules!(t2_funct!("none"), ) )))
    , r#"rep_or_unary"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"atom_or_par"#), rep!(ref_rule!(r#"rep_symbol"#), 0, 1)) ) , t2rules!(t2_byname_opt!("rep_symbol"), t2_byname!("atom_or_par"), ) )), and!(transf2!( and!( and!(lit!("!"), ref_rule!(r#"atom_or_par"#)) ) , t2rules!(t2_text!("NEGATE"), t2_funct!("endl"), t2_byname!("atom_or_par"), ) )), and!(transf2!( and!( and!(lit!("&"), ref_rule!(r#"atom_or_par"#)) ) , t2rules!(t2_text!("PEEK"), t2_funct!("endl"), t2_byname!("atom_or_par"), ) )))
    , r#"tmpl_rule"# => or!(and!(transf2!( and!( and!(lit!("$(")) ) , t2rules!(t2_funct!("none"), ) ), or!(and!(transf2!( and!( and!(lit!("?"), ref_rule!(r#"symbol"#)) ) , t2rules!(t2_text!("NAMED_OPT"), t2_funct!("endl"), t2_byname!("symbol"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(ref_rule!(r#"symbol"#)) ) , t2rules!(t2_text!("NAMED"), t2_funct!("endl"), t2_byname!("symbol"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(lit!("."), named!("pos", or!(and!(rep!(ematch!(chlist r#""#  , from '0', to '9' ), 1))))) ) , t2rules!(t2_text!("POS"), t2_funct!("endl"), t2_byname!("symbol"), t2_byname!("pos"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(lit!(":")) ) , t2rules!(t2_funct!("none"), ) ), transf2!( and!( and!(named!("fn", or!(and!(rep!(or!(and!(not!(or!(and!(lit!(")")), and!(ref_rule!(r#"eol"#)))), dot!())), 1))))) ) , t2rules!(t2_text!("FUNCT"), t2_funct!("endl"), t2_byname!("fn"), t2_funct!("endl"), ) ))), transf2!( and!( and!(lit!(")")) ) , t2rules!(t2_funct!("none"), ) )))
    , r#"main"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"grammar"#)) ) , t2rules!(t2_byname!("grammar"), t2_text!("EOP"), ) )))
    , r#"rule_name"# => or!(and!(ref_rule!(r#"symbol"#)))
    )
}

Writting it by hand, it's dificult.

Isn't this program desineg to receive a text peg grammar and an text input and produce a text output?

IR

IR is from Intermediate Representation

Why???

Once we parse the input, we have an AST. We could process the AST but...

The AST is strongly coupled to the grammar. Most of the times we modify the grammar, we will need to modify the code to process the AST.

Some times the grammar modification will be a syntax modif, or adding some feature that requiere some syntax modification, therefore a different AST but all, or almost all of the concepts remain the same.

Imagine if we wanted to add de function sqrt to the math expresion compiler. We will need to modify the rules generator in order to process the new AST

To decouple the peg grammar from parsing the AST, we will create the IR (Intermediate Representation)

How to get the IR will be defined in the own peg grammar as transformation rules.

An interpreter of the IR will produce the rules in memory. Later, we can generate de rust code from the rules produced, or we could have a specific interpreter to generate them, but it's nice to get it from rust data structures

To develop this feature... we need a parser, and a code generator... Hey!!! I do it. dpr does that!!!

How to generate the IR

  peg_grammar()
    .parse(peg_grammar())
    .gen_rules()
    .replace()

The peg_grammar will have in transformation rules the intructions to generate the IR

Thanks to the IR it's easy to modify this program, and we don't need to deal with the AST coupled to the peg-grammar

Let's see step by step

Creating rules...

extern crate dpr;

fn main() -> Result<(), dpr::Error> {
    let result = dpr::Peg::new(
        "
        main    =   char+
        char    =   'a'     -> A
                /   'b'     -> B
                /   .
    ",
    )
    .gen_rules()?
    // .parse("aaacbbabdef")?
    // .replace()?
    //  ...
    ;

    println!("{:#?}", result);
    Ok(())
}

Produce a set of rules like...

SetOfRules(
    {
        "main": And(
            MultiExpr(
                [
                    Repeat(
                        RepInfo {
                            expression: RuleName(
                                "char",
                            ),
                            min: NRep(
                                1,
                            ),
                            max: None,
                        },
                    ),
                ],
            ),
        ),
        "char": Or(
            MultiExpr(
                [
                    And(
                        MultiExpr(
                            [
                                MetaExpr(
                                    Transf2(
                                        Transf2Expr {
                                            mexpr: MultiExpr(
                                                [
                                                    Simple(
                                                        Literal(
                                                            "a",
                                                        ),
                                                    ),
                                                ],
                                            ),
                                            transf2_rules: "A",
                                        },
                                    ),
                                ),
                            ],
                        ),
                    ),
                    And(
                        MultiExpr(
                            [
                                MetaExpr(
                                    Transf2(
                                        Transf2Expr {
                                            mexpr: MultiExpr(
                                                [
                                                    Simple(
                                                        Literal(
                                                            "b",
                                                        ),
                                                    ),
                                                ],
                                            ),
                                            transf2_rules: "B",
                                        },
                                    ),
                                ),
                            ],
                        ),
                    ),
                    And(
                        MultiExpr(
                            [
                                Simple(
                                    Dot,
                                ),
                            ],
                        ),
                    ),
                ],
            ),
        ),
    },
)

This set of rules will let us to parse and generate the AST for any input

Next step, parsing the input with generated rules...

Creating rules... (With a simplified input in order to reduce the output size)

extern crate dpr;

fn main() -> Result<(), dpr::Error> {
    let result = dpr::Peg::new(
        "
        main    =   char+
        char    =   'a'     -> A
                /   'b'     -> B
                /   .
    ",
    )
    .gen_rules()?
    .parse("acb")?
    // .replace()?
    //  ...
    ;

    println!("{:#?}", result);
    Ok(())
}

Now you can see de produced AST

And running the transformations...

extern crate dpr;

fn main() -> Result<(), dpr::Error> {
    let result = dpr::Peg::new(
        "
        main    =   char+
        char    =   'a'     -> A
                /   'b'     -> B
                /   .
    ",
    )
    .gen_rules()?
    .parse("acb")?
    .replace()?
    //  ...
    ;

    println!("{:#?}", result);
    Ok(())
}

"AcB"

Dependencies

~1.5MB
~32K SLoC