21 stable releases
2.7.14 | Oct 14, 2024 |
---|---|
2.7.11 | Jul 2, 2024 |
2.7.8 | Mar 2, 2024 |
2.7.5 | Oct 24, 2023 |
0.0.0 | Nov 14, 2022 |
#215 in Parser tooling
31 downloads per month
Used in my-gym-data-rust-parser
1.5MB
28K
SLoC
pest. The Elegant Parser
pest is a general purpose parser written in Rust with a focus on accessibility, correctness, and performance. It uses parsing expression grammars (or PEG) as input, which are similar in spirit to regular expressions, but which offer the enhanced expressivity needed to parse complex languages.
Getting started
The recommended way to start parsing with pest is to read the official book.
Other helpful resources:
- API reference on docs.rs
- play with grammars and share them on our fiddle
- find previous common questions answered or ask questions on GitHub Discussions
- leave feedback, ask questions, or greet us on Gitter or Discord
Example
The following is an example of a grammar for a list of alphanumeric identifiers where all identifiers don't start with a digit:
alpha = { 'a'..'z' | 'A'..'Z' }
digit = { '0'..'9' }
ident = { !digit ~ (alpha | digit)+ }
ident_list = _{ ident ~ (" " ~ ident)* }
// ^
// ident_list rule is silent which means it produces no tokens
Grammars are saved in separate .pest files which are never mixed with procedural code. This results in an always up-to-date formalization of a language that is easy to read and maintain.
Meaningful error reporting
Based on the grammar definition, the parser also includes automatic error
reporting. For the example above, the input "123"
will result in:
thread 'main' panicked at ' --> 1:1
|
1 | 123
| ^---
|
= unexpected digit', src/main.rs:12
while "ab *"
will result in:
thread 'main' panicked at ' --> 1:1
|
1 | ab *
| ^---
|
= expected ident', src/main.rs:12
These error messages can be obtained from their default Display
implementation,
e.g. panic!("{}", parser_result.unwrap_err())
or println!("{}", e)
.
Pairs API
The grammar can be used to derive a Parser
implementation automatically.
Parsing returns an iterator of nested token pairs:
use pest_derive::Parser;
use pest::Parser;
#[derive(Parser)]
#[grammar = "ident.pest"]
struct IdentParser;
fn main() {
let pairs = IdentParser::parse(Rule::ident_list, "a1 b2").unwrap_or_else(|e| panic!("{}", e));
// Because ident_list is silent, the iterator will contain idents
for pair in pairs {
// A pair is a combination of the rule which matched and a span of input
println!("Rule: {:?}", pair.as_rule());
println!("Span: {:?}", pair.as_span());
println!("Text: {}", pair.as_str());
// A pair can be converted to an iterator of the tokens which make it up:
for inner_pair in pair.into_inner() {
match inner_pair.as_rule() {
Rule::alpha => println!("Letter: {}", inner_pair.as_str()),
Rule::digit => println!("Digit: {}", inner_pair.as_str()),
_ => unreachable!()
};
}
}
}
This produces the following output:
Rule: ident
Span: Span { start: 0, end: 2 }
Text: a1
Letter: a
Digit: 1
Rule: ident
Span: Span { start: 3, end: 5 }
Text: b2
Letter: b
Digit: 2
Defining multiple parsers in a single file
The current automatic Parser
derivation will produce the Rule
enum
which would have name conflicts if one tried to define multiple such structs
that automatically derive Parser
. One possible way around it is to put each
parser struct in a separate namespace:
mod a {
#[derive(Parser)]
#[grammar = "a.pest"]
pub struct ParserA;
}
mod b {
#[derive(Parser)]
#[grammar = "b.pest"]
pub struct ParserB;
}
Other features
- Precedence climbing
- Input handling
- Custom errors
- Runs on stable Rust
Projects using pest
You can find more projects and ecosystem tools in the awesome-pest repo.
- pest_meta (bootstrapped)
- AshPaper
- brain
- cicada
- comrak
- elastic-rs
- graphql-parser
- handlebars-rust
- hexdino
- Huia
- insta
- jql
- json5-rs
- mt940
- Myoxine
- py_literal
- rouler
- RuSh
- rs_pbrt
- stache
- tera
- ui_gen
- ukhasnet-parser
- ZoKrates
- Vector
- AutoCorrect
- yaml-peg
- qubit
- caith (a dice roller crate)
- Melody
- json5-nodes
- prisma
Minimum Supported Rust Version (MSRV)
This library should always compile with default features on Rust 1.61.0.
no_std support
The pest
and pest_derive
crates can be built without the Rust standard
library and target embedded environments. To do so, you need to disable
their default features. In your Cargo.toml
, you can specify it as follows:
[dependencies]
# ...
pest = { version = "2", default-features = false }
pest_derive = { version = "2", default-features = false }
If you want to build these crates in the pest repository's workspace, you can
pass the --no-default-features
flag to cargo
and specify these crates using
the --package
(-p
) flag. For example:
$ cargo build --target thumbv7em-none-eabihf --no-default-features -p pest
$ cargo bootstrap
$ cargo build --target thumbv7em-none-eabihf --no-default-features -p pest_derive
Special thanks
A special round of applause goes to prof. Marius Minea for his guidance and all pest contributors, some of which being none other than my friends.
Dependencies
~9–23MB
~368K SLoC