5 releases

0.4.3 Mar 13, 2020
0.4.2 Mar 8, 2020
0.4.1 Mar 7, 2020
0.4.0 Mar 7, 2020
0.2.0 Feb 21, 2020

#117 in Parser tooling

23 downloads per month

ISC license

125KB
1.5K SLoC

Rust crates.io docs.rs license

Autumn recursive decscent parser combinator library

Autumn is a library for building recursive descent parsers using combinators. It provides a set of basic parsers for parsing individual characters and common groupings of characters, primitives to collect parsed values, and combinators for combining and modifying parsers.

Getting started

A parser is an item that implements the Parser trait. The most simple way to implement the trait is with a function given the following signature:

fn parser(source: &str, location: Span) -> ParseResult<String> {
    /* ... */
}

This is a function that takes an input string slice its location in some original source text and will parse a String value starting at that location.

A simple identifier parser

The following is an example that implements a parser of C-like identifiers. These begin with an ASCII letter or underscore and are followed by any number of ASCII digits, letters, or underscores.

/// Parse a single alphabetic character or underscore
fn identifier_prefix(source: &str, location: Span) -> ParseResult<Span> {
    alphabetic.or("_").parse(source, location)
}

/// Parse a zero or more letters, digits, and underscores
fn identifier_body(source: &str, location: Span) -> ParseResult<Span> {
    identifier_prefix.or(digit).multiple().maybe().parse(source, location)
}

/// Accumulate the characters for a single identifier into a string
fn identifier(source: &str, location: Span) -> ParseResult<String> {
    identifier_prefix
        .and(identifier_body)
        .copy_string()
        .parse(source, location)
}

As the functions above each implement the Parser trait, combinators can be used directly on the functions.

A number of the most common combinators are shown in the example above.

  • or will take a parser and additionally try an alternative parser at the same location. This can produce a result from both parsers if they ar successful.

  • and will take a parser that produces a List and append the result of another parser that produces the same type of list.

  • multiple will take a parser that produces a List and attempt to apply that parser one or more times in succession.

  • maybe will take a parser that produces a List and attempt to apply that parser zero or one times. When using both multiple and maybe to achieve zero or more repetitions, multiple().maybe() must be used; maybe().multiple() can find an infinite number of ways to apply any parser on even an empty string.

  • map can be used to transform the type produced by a particular parser.

  • end will only produce a successful parse if there is no input remaining after the parser.

A few of the provided parsers have also been used above.

  • alphabetic will parse a single character that is an ASCII alphabetic character and produce a List<char>.

  • digit will parse a single character that is an ASCII digit character and produce a List<char>.

  • Any &str or String can be used to parse itself and produce a List<char>.

When invoking a parser the source must be provided as a string slice and the current position must be provided as a Span. An intial span can be provided for the start of any string using new_location but a span associated with a path to a file should be specified using path_location.

The ParseResult produced by a parser contains all valid parsings of the input string.

If an ambiguous parser is constructed, every unique manner in which that parser could be used to process the input will be produced; single_parse can be used to check that only a single valid parse was produced.

Every successful parse is encapsulated in a Meta that associates it with a Span that reflects the range within the source that contains the characters parsed for that result. The meta combinator can be used to obtain the location associated with a value during parsing.

Errors

A parser may also need to produce errors while parsing. The parser will discard errors that are not associated with a valid parse so some value must be associated with the error and the error will be associated with a particular location in the parse.

If there are errors associated with a ParseResult their type must be passed as the third type argument to the type constructor.

If any valid parse of the source produced errors is_success will return false.

The following example will either parse the first five letters of the alphabet in lower-case or will produce an error associated with the alphabetic characters starting at the same location.

/// Parses the first 5 letters of the alphabet in order
fn alphabet_parse() -> impl Parser<Span, &'static str> {
    "abcde"
        .on_none(
            alphabetic
                .multiple()
                .maybe()
                .and_then(|text| error(text, "Not in alphabetical order"))
        )
}

The on_none combinator can be used to provide an alternative parser that is used only if the original parser was unable to find any valid parse (including parsers that resulted in an error).

The on_failure combinator can be used to provide an alternative parser that is used when the original parser produced an error or if no valid parse was found or if a valid parse associated with an error was found.

The error parser will produce a valid parse with the provided value but will also produce the provided error. This allows parsing to continue and find further errors.

The ParseResult produced by a parser contains all errors produced by all valid parsings of the input. Each error will be associated with the location in the source where the error was produced.

Exceptions

Exceptions can be used to associate errors with an exact span of input rather than a single location. The throw parser can be used exactly like the error parser except it creates an exception, rather than an error, at the same location. The catch combinator can then be used to convert the exception into an error and extend the start of the span associated with the error all the way back to the start of the original parse.

The following example shows how the error produced by the alphabet_parse function above can be associated with the span of source code that was consumed to produce the error.

/// Parses the first 5 letters of the alphabet in order
fn alphabet_parse() -> impl Parser<Span, &'static str> {
    "abcde"
        .on_none(
            alphabetic
                .multiple()
                .maybe()
                .and_then(|text| throw(text, "Not in alphabetical order"))
        )
        .catch()
}

Current version: 0.4.3

License: ISC

No runtime deps