3 unstable releases

0.2.0 Feb 26, 2024
0.1.1 Feb 25, 2024
0.1.0 Feb 19, 2024

#386 in Parser implementations

31 downloads per month
Used in spex

MPL-2.0 license

135KB
2K SLoC

Simple parser package.

This package provides a Parser which allows you to peek at characters to see what's coming next in a character stream, and to read expected characters. For example, methods include:

  • peek() - return the next character from the stream without removing it;
  • require(&str) - returns an error if the given sequence is not found next in the stream;
  • skip_while(Fn(char)->bool) - keep removing characters while the predicate is satisfied;
  • read_up_to(char) - take characters from the stream up to (but not including) the given character;
  • accept(char) - skips the given character if it is found next in the stream.

All fallible methods return a Result and no method in this package should ever panic.

Parsing relies on a ByteBuffer which wraps around a byte stream, and on a decoder such as Utf8Decoder which wraps around a ByteBuffer and decodes bytes into characters. Finally a Parser is created by wrapping a decoder. The end result is a Parser which lets you peek and read characters.

Examples

Acornsoft Logo parser

Suppose you want to parse a (simplified) set of Acornsoft Logo instructions, such that you only want to accept the "FORWARD", "LEFT", and "RIGHT" instructions, and each instruction must come on a line of its own (separated by a newline character), and each instruction is followed by any number of space characters, which is then followed by a numeric amount. Example input might look like this:

FORWARD 10
RIGHT 45
FORWARD 20
RIGHT 10
FORWARD 5
LEFT 3

You could use sipp to parse these instructions using code like this:

let input =
  "FORWARD 10\nRIGHT 45\nFORWARD 20\nRIGHT 10\nFORWARD 5\nLEFT 3";
// We know that Rust strings are UTF-8 encoded, so wrap the input
// bytes with a Utf8Decoder.
let decoder = Utf8Decoder::wrap(input.as_bytes());
// Now wrap the decoder with a Parser to give us useful methods
// for reading through the input.
let mut parser = Parser::wrap(decoder);
// Keep reading while there is still input available.
while parser.has_more()? {
    // Read the command by reading everything up to (but not
    // including) the next space.
    let command = parser.read_up_to(' ')?;
    // Skip past the (one or more) space character.
    parser.skip_while(|c| c == ' ')?;
    // Read until the next newline (or the end of input, whichever
    // comes first).
    let number = parser.read_up_to('\n')?;
    // Now either there is no further input, or the next character
    // must be a newline. If the next character is a newline, skip
    // past it.
    parser.accept('\n')?;
}

Comma-separated list parser

Given a hardcoded string which represents a comma-separated list, you could use this package to parse it like so:

let input = "first value,second value,third,fourth,fifth,etc";
let buffer = ByteBuffer::wrap(input.as_bytes());
let decoder = Utf8Decoder::wrap_buffer(buffer);
let mut parser = Parser::wrap(decoder);
let mut value_list = Vec::new();
// Keep reading while input is available.
while parser.has_more()? {
    // Read up to the next comma, or until the end of input
    // (whichever comes first).
    let value = parser.read_up_to(',')?;
    value_list.push(value);
    // Now either there is no further input, or the next character
    // must be a comma. If the next character is a comma, skip
    // past it.
    parser.accept(',')?;
}

assert_eq!(value_list
    .iter()
    .map(|s| s.to_string())
    .collect::<Vec<String>>(),
    vec!["first value",
        "second value",
        "third",
        "fourth",
        "fifth",
        "etc"]);

Release notes

0.1.0

Initial release.

0.1.1

  • Added has_more method to Parser.
  • Adjusted rustdoc based on advice found in Rust API Guidelines, primarily in separating out error descriptions from the lede and moving them into a dedicated "Errors" section within each method's rustdoc comment.

0.2.0

Altered return type of public method Parser.read_up_to(char) so that it now returns None instead of an empty String. Adjusted examples and unit tests accordingly.

No runtime deps