11 releases (7 breaking)

0.8.0 Oct 17, 2023
0.6.2 Jul 30, 2023
0.5.0 Nov 23, 2019
0.3.0 May 16, 2016
0.0.1 Nov 13, 2014

#15 in Parser tooling

Download history 14/week @ 2023-12-24 13/week @ 2024-01-07 9/week @ 2024-01-14 1/week @ 2024-01-21 3/week @ 2024-02-04 12/week @ 2024-02-11 72/week @ 2024-02-18 43/week @ 2024-02-25 41/week @ 2024-03-03 80/week @ 2024-03-10 42/week @ 2024-03-17 51/week @ 2024-03-24 107/week @ 2024-03-31 13/week @ 2024-04-07

225 downloads per month
Used in 9 crates (8 directly)

MIT/Apache

155KB
4K SLoC

cfg

Context-free grammar tools.

crates.io Documentation Rust CI MSRV

Dependency Status Download Status

Rust library for manipulating context-free grammars. You can check the documentation here.

Analyzing and modifying grammars

The following features are implemented thus far:

  • rich rule building
    • sequence rules,
    • precedenced rules.
  • conversions to a shape similar to Chomsky Normal Form
    • grammar binarization,
    • nulling rule elimination for binarized grammars.
  • sanity
    • cycle detection and elimination,
    • useless rule detection and elimination,
    • unused symbol removal.
  • analysis for LR(1), LL(1) and others
    • FIRST and FOLLOW set computation,
    • minimal distance computation,
    • LL(1) classification.
  • tools for probabilistic grammars
    • generation for PCFGs + negative zero-width lookahead.

Building grammars

cfg includes an interface that simplifies grammar construction.

Generating symbols

The easiest way of generating symbols is with the sym method. The library is unaware of the start symbol.

let mut grammar: Cfg = Cfg::new();
let (start, expr, identifier, number,
     plus, multiply, power, l_paren, r_paren, digit) = grammar.sym();

Building grammar rules

Rules have a LHS symbol and zero or more RHS symbols.

Example BNF:

start ::= expr | identifier l_paren expr r_paren

With our library:

grammar.rule(start).rhs([expr])
                   .rhs([identifier, l_paren, expr, r_paren]);

Building sequence rules

Sequence rules have a LHS symbol, a RHS symbol, a range of repetitions, and optional separation. Aside from separation, they closely resemble regular expression repetitions.

Example BNF:

number ::= digit+

With our library:

grammar.sequence(number).inclusive(1, None).rhs(digit);

Building precedenced rules

Precedenced rules are the most convenient way to describe operators. Once built, they are immediately rewritten into basic grammar rules, and unique symbols are generated. Operator associativity can be set to Right or Group. It's Left by default.

use cfg::precedence::Associativity::{Right, Group};

grammar.precedenced_rule(expr)
           .rhs([number])
           .rhs([identifier])
           .associativity(Group)
           .rhs([l_paren, expr, r_paren])
       .lower_precedence()
           .associativity(Right)
           .rhs([expr, power, expr])
       .lower_precedence()
           .rhs([expr, multiply, expr])
       .lower_precedence()
           .rhs([expr, plus, expr]);

Using a custom grammar representation

Your grammar type has to implement a trait, and two more traits are needed for grammar references:

  • implement RuleContainer for MyGrammar
  • implement RuleContainerRef for &'a MyGrammar
  • implement RuleContainerMut for &'a mut MyGrammar

License

Dual-licensed for compatibility with the Rust project.

Licensed under the Apache License Version 2.0: http://www.apache.org/licenses/LICENSE-2.0, or the MIT license: http://opensource.org/licenses/MIT, at your option.

Dependencies

~0.2–1.3MB
~23K SLoC