12 unstable releases (5 breaking)
0.6.0 | May 20, 2024 |
---|---|
0.5.1 | May 8, 2024 |
0.4.3 | Nov 21, 2021 |
0.3.1 | Oct 18, 2021 |
0.1.1 | Aug 8, 2021 |
#390 in #find
32 downloads per month
Used in gramatika
39KB
1K
SLoC
Gramatika
A minimal toolkit for writing parsers with Rust
Motivation
Though powerful and useful in a lot of situations, I find parser generators to be kind of fiddly and onerous to work with for a variety of reasons. On the other hand, writing a parser by hand requires a ton of tedious boilerplate just to get off the ground.
This project is an attempt to find my Goldilocks zone (your mileage may vary) between automagic grammar-based tools and staring into the terrifying abyss of a blank lib.rs
file. Currently, it provides a lexer generator that's dirt simple to use, some convenience macros, and some barebones parsing primitives inspired by syn — just enough to give you a rolling start and get out of your way.
Status
This crate has matured quite a bit since its inception and is currently being used in at least one real-world production codebase: a WGSL parser powering a language server protocol implementation for IDE tooling. That said, the API should still be considered unstable and subject to breaking change between minor versions until version 1.0 is released.
Getting Started
A brief tutorial and API overview is available in the crate-level documentation.
You can also explore two fully-working, non-trivial example projects in this repo:
-
examples/lox
is parser for the Lox programming language implemented with Gramatika's derive macros. -
examples/lox_manual_impl
is a parser that manually implements Gramatika's traits by hand-writing all of the code that's normally generated by the derive macros.This is a great place to start if you're curious about the implementation details, or if you need to manually implement any of Gramatika's traits to cover a special use case.
Use Cases
Gramatika is ideally suited to languages that meet the following criteria:
-
The text can be tokenized statelessly — that is, a particular pattern of characters can always be represented by the same token regardless of where it appears within the syntax tree.
For example, languages in the XML family do not meet this condition, because only text within
<
and>
braces (and the braces themselves) should be tokenized, with the remaining text being treated as verbatim "text nodes." -
The language can be parsed with an LL(1) or recursive descent parser with one lookahead token and limited back-tracking.
That said, Gramatika's APIs are flexible enough that, with some additional effort, it could be adapted to fit use cases that don't strictly meet these criteria. Feel free to get in touch if you'd like to use Gramatika for such a use case and would like some guidance or advice.
Dependencies
~5.5MB
~83K SLoC