#regex #regular #nfa #dfa #automata

cfg-regex

Conversion from a regular expression to a context-free grammar

3 unstable releases

Uses old Rust 2015

0.1.1 Oct 4, 2019
0.1.0 Oct 4, 2019
0.0.0 May 20, 2016

#1486 in Algorithms


Used in 5 crates (2 directly)

MIT/Apache

10KB
222 lines

gearley • Build status Latest version

An Earley parser engine. Work in progress. You can check the documentation here.

This engine is meant to be a foundation of an optimized parser generator.

Gearley is largely inspired by the Marpa parser by Jeffrey Kegler.

Extending gearley

The grammar is stored in a serialized form. You may create or deserialize it yourself. Grammar construction is implemented in the cfg library.

The recognizer provides an interface for writing a custom parse forest. Or you may reuse the default parse forest algorithm, but write your own code for controlling rule order, and for storing evaluated values within each tree node.

Yet another interface gives immediate control over rule completion. You may reject certain completed rules or modify their parse forests as the parse progresses.

Gearley is perfectly extensible on every level.

Glossary

Recognizer

Gearley term Marpa term Alternative term
dot dotted rule --
earleme earleme input location
item Earley item situation
origin origin distance
rule history rule semantics --

| complete | complete | accept |

Dot — a position in the grammar, which is an integer.

Earleme — scalar position, currently equivalent to the input location index.

Item — a value that consists of a dot, an origin and a bocage node.

Origin — the Earley set number where a rule was predicted. Always smaller than the current Earley set ID for non-predicted items.

Rule history — a value that contains an action number and other information about semantics. Each rule carries its own history.

Parse forest

Gearley term Marpa term Alternative term
bocage bocage Shared Packed Parse Forest
depth-first bocage Abstract Syntax Forest --
sum node glade OR node
product node factoring AND node
leaf node bocage symbol leaf node
root node peak glade top node

Bocage — a parse forest in the form of a Directed Acyclic Graph.

Depth-first bocage — a bocage that is traversed by evaluating one whole bocage node at a time.

Sum node — a node that sums the number of trees in the forest.

Product node — a node that may multiply the number of trees in the forest.

Leaf node — a terminal node that begins a single tree in the forest.

Root node — a node that is used as a parse result.

License

Dual-licensed for compatibility with the Rust project.

Licensed under the Apache License Version 2.0: http://www.apache.org/licenses/LICENSE-2.0, or the MIT license: http://opensource.org/licenses/MIT, at your option.

Dependencies

~2MB
~55K SLoC