12 releases (breaking)
0.9.0 | Jul 10, 2024 |
---|---|
0.8.0 | Oct 17, 2023 |
0.7.0 | Oct 16, 2023 |
0.6.2 | Jul 30, 2023 |
0.0.1 | Nov 13, 2014 |
#22 in Parser tooling
530 downloads per month
Used in 14 crates
(12 directly)
165KB
4K
SLoC
Rust library for manipulating context-free grammars. You can check the documentation here.
Usage
Add this to your Cargo.toml:
[dependencies]
cfg = "0.9"
If you want grammar serialization support with miniserde
, include the feature like this:
[dependencies]
cfg = { version = "0.9", features = ["serialize"] }
If you want weighted generation support, include the feature like this:
[dependencies]
cfg = { version = "0.9", features = ["weighted-generation"] }
If you want LL(1) classification support, include the feature like this:
[dependencies]
cfg = { version = "0.9", features = ["ll"] }
Analyzing and modifying grammars
The following features are implemented thus far:
- rich rule building
- sequence rules,
- precedenced rules.
- conversions to a shape similar to Chomsky Normal Form
- grammar binarization,
- nulling rule elimination for binarized grammars.
- sanity
- cycle detection and elimination,
- useless rule detection and elimination,
- unused symbol removal.
- analysis for LR(1), LL(1) and others
- FIRST and FOLLOW set computation,
- minimal distance computation,
- LL(1) classification.
- tools for probabilistic grammars
- generation for PCFGs + negative zero-width lookahead.
Building grammars
cfg
includes an interface that simplifies grammar construction.
Generating symbols
The easiest way of generating symbols is with the sym
method. The library is unaware
of the start symbol.
let mut grammar: Cfg = Cfg::new();
let (start, expr, identifier, number,
plus, multiply, power, l_paren, r_paren, digit) = grammar.sym();
Building grammar rules
Rules have a LHS symbol and zero or more RHS symbols.
Example BNF:
start ::= expr | identifier l_paren expr r_paren
With our library:
grammar.rule(start).rhs([expr])
.rhs([identifier, l_paren, expr, r_paren]);
Building sequence rules
Sequence rules have a LHS symbol, a RHS symbol, a range of repetitions, and optional separation. Aside from separation, they closely resemble regular expression repetitions.
Example BNF:
number ::= digit+
With our library:
SequencesToProductions::new(&mut grammar).sequence(number).inclusive(1, None).rhs(digit);
Building precedenced rules
Precedenced rules are the most convenient way to describe operators. Once
built, they are immediately rewritten into basic grammar rules, and unique
symbols are generated. Operator associativity can be set to Right
or
Group
. It's Left
by default.
use cfg::precedence::Associativity::{Right, Group};
grammar.precedenced_rule(expr)
.rhs([number])
.rhs([identifier])
.associativity(Group)
.rhs([l_paren, expr, r_paren])
.lower_precedence()
.associativity(Right)
.rhs([expr, power, expr])
.lower_precedence()
.rhs([expr, multiply, expr])
.lower_precedence()
.rhs([expr, plus, expr]);
Using a custom grammar representation
Your grammar type has to implement the RuleContainer
trait.
License
Dual-licensed for compatibility with the Rust project.
Licensed under the Apache License Version 2.0: http://www.apache.org/licenses/LICENSE-2.0, or the MIT license: http://opensource.org/licenses/MIT, at your option.
Dependencies
~125–480KB