16 breaking releases

new 0.17.0 Jan 8, 2025
0.16.0 Dec 29, 2024
0.15.0 Dec 27, 2024
0.14.0 Jun 27, 2024
0.6.0 Dec 19, 2023

#82 in Parser tooling

Download history 1/week @ 2024-09-22 1/week @ 2024-09-29 3/week @ 2024-11-03 2/week @ 2024-12-08 130/week @ 2024-12-22 134/week @ 2024-12-29 142/week @ 2025-01-05

406 downloads per month
Used in html-string

MIT license

28KB
522 lines

bparse

A combinatorial approach to matching bytes, especially useful for writing lexers and tokenizers.

The crate borrows concepts from other parser-combinator crates but heavily simplifies things by eschewing error management and focusing exclusively on parsing byte slices.


lib.rs:

Overview

Most parsing tasks can be boiled down to extracting meaning out of arbitrary bytes. Regardless of how this is done, you'll have code that looks at a series of bytes and does something based on what was seen. This crate simplifies the task of repeatdly recognizing bytes in a byte slice.

The crate is made up of three parts:

  1. A Pattern trait for types that are able to recognizes byte sequences
  2. A list of common functions and combinators for composing Patterns together.
  3. The Bytes struct; a Cursor-like wrapper around some input that uses patterns to advance the position.

Creating Patterns

Spaces in HTTP start lines:

The elements of an HTTP request line are usually separated by a single space. The spec is more permissive and allows for an arbitrary amount of tabs or whitespace. Here is a pattern than can be used to skip heterogenous spaces:

use bparse::{oneof, at_least};
at_least(1, oneof(b" \t"));

JSON numbers:

Recognizing JSON numbers can get tricky. The spec allows for numbers like 12, -398.42, and even 12.4e-3. Here we incrementally build up a pattern called number that can recognizes all JSON number forms:

use bparse::{Pattern, oneof, range, at_least, optional};

let sign = optional(oneof(b"-+"));
let onenine = range(b'1', b'9');
let digit = "0".or(onenine);
let digits = at_least(1, digit);
let fraction = optional(".".then(digits));
let exponent = optional("E".then(sign).then(digits).or("e".then(sign).then(digits)));
let integer = onenine
    .then(digits)
    .or("-".then(onenine).then(digits))
    .or("-".then(digit))
    .or(digit);
let number = integer.then(fraction).then(exponent);

Using Patterns

If you have written parsers before, you have probably implemented a wrapper around your raw input with methods such as peek, accept, next() etc... We do this because it simplifies keeping track of our position and asserting things about the input. The Bytes struct does exactly that.

Here is contribed example of parsing a Set-Cookie header value. If you were actually doing this, the code would be a bit more structured (a state machine perhaps?), but you would still use Bytes in a similar manner.

use std::str::from_utf8;
use bparse::{Bytes, oneof, noneof, at_least};

let cookie = " id=b839d87df;Domain=foo.com;   HttpOnly;";

let mut bytes = Bytes::from(cookie);

let mut is_http_only = false;
let mut domain = None;
let mut name = "";
let mut value = "";

let until_semicolon = at_least(1, noneof(b";"));
let until_eql = at_least(1, noneof(b"="));
let optional_ws = at_least(0, oneof(b"\t "));

loop {
    if bytes.eof() {
        break;
    }

    let _ = bytes.parse(optional_ws);

    if bytes.parse("Domain=").is_some() {
        domain = bytes.parse(until_semicolon).map(|b| from_utf8(b).unwrap());
        let _ = bytes.parse(";");
        continue;
    }

    if bytes.parse("HttpOnly;").is_some() {
        is_http_only = true;
        continue;
    }

    if let Some(cookie_name) = bytes.parse(until_eql) {
        let _ = bytes.parse("=");
        name = from_utf8(cookie_name).unwrap();
        let Some(cookie_value) = bytes.parse(until_semicolon) else {
            panic!("missing cookie value");
        };
        value = from_utf8(cookie_value).unwrap();
        let _ = bytes.parse(";");
        continue;
    }
}

assert!(is_http_only);
assert_eq!(domain, Some("foo.com"));
assert_eq!(name, "id");
assert_eq!(value, "b839d87df");

No runtime deps