#parser #syntax-tree #language #assist #compiling #api

spdl

API for compiling and using SPDL, a language to assist with parsing code into syntax trees

1 unstable release

0.0.1 Sep 25, 2020

#20 in #assist

MIT license

42KB
787 lines

This crate contains an API to compiling and using the SPDL programming language. SPDL (pronounced spiddle) stands for the Structured Parsing Description Language, and is used to make parsing in a compiler much easier. This crate contains a function to compile SPDL into a parser, which can be used to turn code into syntax trees.

A syntactical instance, in this compiler's terms, is a syntactical structure describing parsing at a given point. This generated SPDL parser contains a root syntactical instance for the whole file, along with a set of syntactical instances for variables. Each of these syntactical instances has its own function for parsing at a point.

The SPDLParser structure wraps around a root syntactical instance that parses at the start of the code and is expected to cover the whole of it. It also wraps an array of all of the variables in the root syntactical instance. The SPDLParser structure contains a get_syntax_tree function which gets the syntax tree of input code based on the description of it's root syntactical instance, and returns a result that either contains errors or the output syntax tree.

Each syntactical instance has a certain syntactical instance type, and gives its resulting output of parsing at a point in input code. There are many types of syntactical instances currently available for usage in SPDL:

  1. Set(&[SyntacticalInstance])

The "Set" syntactical instance type contains a set of syntactical instances to parse in order.

  1. ZeroOrMore(&SyntacticalInstance)

The "ZeroOrMore" syntactical instance type asks the parser to parse a syntactical instance as many times as it can before giving up.

  1. OneOrMore(&SyntacticalInstance)

Like ZeroOrMore, except for the fact that it requires a minimum of one iteration of successful parsing to succeed.

  1. Either(&[SyntacticalInstance])

This allows for any one syntactical instances in a set to have parsed successfully at a point. This fails if none in the set are successful.

  1. Not(&SyntacticalInstance)

The "Not" syntactical instance type disallows a certain syntactical instance to successfully parse at a point.

  1. Regular(&str)

The "Regular" syntactical instance type parses successfully at a point if the current point starts with the string described in this syntactical instance type.

  1. Search(&str)

The "Search" syntactical instance type serves as a runtime search for a syntactical instance with a specific name. Any invalid variable name in this will be detected at compile-time. This is needed when two or more variables are mutually dependent on each other, and hence can be used for recursion.

  1. Regex(Regex)

The "Regex" syntactical instance type parses successfully if the current point in the input matches this regex.

  1. Questioned(&str, &SyntacticalInstance)

This captures the result of parsing a syntactical instance (as in the second field), and parses the result with another syntactical instance (the name of this is in the first field).

An example of this crate's usage, parsing 300,000 print statements within less than a second on some computers, is as follows:

fn main() {
let code = r#"
print "Hello, world!";
"#.repeat(300_000);

let spdl_code = r##"
freeform true
string = /"[^"]*"/
printStmt = print #string#;
seterror printStmt Invalid print statement!
parse printStmt
"##;
let parser = spdl::process_spdl(spdl_code);
if parser.is_err() {
let errors = parser.unwrap_err();
for err in errors {
println!("{}", err);
}
panic!("Failed test!");
}
let parser = parser.unwrap();

let time = std::time::Instant::now();
parser.get_syntax_tree(Box::leak(Box::new(code.into_boxed_str())), false).unwrap();
println!("{:?}", time.elapsed());
}

The actual syntax of valid SPDL code consists of comments, variable declarations, freeform configurations, custom error-handling configurations, and a single parsing configuration. SPDL is not a free-form language, except for with line breaks. Every line in SPDL code is trimmed on both ends before processing. Every comment in SPDL is a line that starts with a hashtag after being trimmed. Every line in SPDL must start and complete exactly one of these descriptions, or just be completely empty after trimming.

Every line that starts with a hashtag after trimming is a comment and is ignored. Every empty line after trimming is also ignored.

Every variable declaration has this rigid syntax:

varName = sequence

The sequence in the declaration can either be empty, or be one of the following syntactical descriptions:

*sequenceGoesHere* (This desribes a zero or more statement for a syntactical instance with the ZeroOrMore type)

+sequenceGoesHere+ (This describes a one or more statement for a syntactical instance with the OneOrMore type)

|sequenceGoesHere$maybeAnotherHere| (This describes an either statement for a syntactical instance with the Either type)

/regexGoesHere/ (This describes a regex statement for a syntactical instance with the Regex type)

^sequenceGoesHere^ (This desribes a not statement for a syntactical instance with the Not type)

&variableName& (This describes a search statement for a syntactical instance with the Search type)

?parseAs:sequenceGoesHere? (This describes a parse statement for a syntactical instance with the Questioned type)

#variableName# (This includes a previously declared variable's syntax in the variable declaration)

\punctuationGoesHere (This describes an escape. No escape is invalid, but if a backslash is not followed by punctuation or a backslash, then it gets interpreted as a regular backslash followed by a regular character)

regularSyntacticalInstance (This describes a syntactical instance with the Regular type, consisting of a continous set of one or more escapes or non-punctuation characters)

Punctuation as mentioned above can be any of these characters separated by commas: *, +, |, /, ^. &, #, ? Punctuation is used to specify the start of a different kind of syntactical instance in the sequence of syntactical instances in a variable declaration. Escapes give a way to remove their meaning.

Every custom error handling configuration in SPDL, which sets up a custom error to throw when parsing a variable fails, has the following syntax:

seterror varName message

The parsing configuration for SPDL code describing an SPDL parser tells which variable should be parsed recursively until the end of the input to the parser is reached. The syntax of such is as follows:

parse varName

Every freeform configuration in SPDL consists of the "freeform" keyword, followed by a single space, followed by true or false. This is to configure whether or not free-formedness is enabled after a certain line. Free-formedness is disabled by default, but has the effect of surrounding every regular syntactical instance in a variable declaration with regexes to capture optional whitespace. !

Dependencies

~2–3MB
~54K SLoC