16 breaking releases
new 0.17.0 | Jan 8, 2025 |
---|---|
0.16.0 | Dec 29, 2024 |
0.15.0 | Dec 27, 2024 |
0.14.0 | Jun 27, 2024 |
0.6.0 | Dec 19, 2023 |
#82 in Parser tooling
406 downloads per month
Used in html-string
28KB
522 lines
bparse
A combinatorial approach to matching bytes, especially useful for writing lexers and tokenizers.
The crate borrows concepts from other parser-combinator crates but heavily simplifies things by eschewing error management and focusing exclusively on parsing byte slices.
lib.rs
:
Overview
Most parsing tasks can be boiled down to extracting meaning out of arbitrary bytes. Regardless of how this is done, you'll have code that looks at a series of bytes and does something based on what was seen. This crate simplifies the task of repeatdly recognizing bytes in a byte slice.
The crate is made up of three parts:
- A
Pattern
trait for types that are able to recognizes byte sequences - A list of common functions and combinators for composing
Patterns
together. - The
Bytes
struct; aCursor
-like wrapper around some input that uses patterns to advance the position.
Creating Pattern
s
Spaces in HTTP start lines:
The elements of an HTTP request line are usually separated by a single space. The spec is more permissive and allows for an arbitrary amount of tabs or whitespace. Here is a pattern than can be used to skip heterogenous spaces:
use bparse::{oneof, at_least};
at_least(1, oneof(b" \t"));
JSON numbers:
Recognizing JSON numbers can get tricky.
The spec allows for numbers like 12
, -398.42
, and even 12.4e-3
.
Here we incrementally build up a pattern called number
that can recognizes all JSON number forms:
use bparse::{Pattern, oneof, range, at_least, optional};
let sign = optional(oneof(b"-+"));
let onenine = range(b'1', b'9');
let digit = "0".or(onenine);
let digits = at_least(1, digit);
let fraction = optional(".".then(digits));
let exponent = optional("E".then(sign).then(digits).or("e".then(sign).then(digits)));
let integer = onenine
.then(digits)
.or("-".then(onenine).then(digits))
.or("-".then(digit))
.or(digit);
let number = integer.then(fraction).then(exponent);
Using Pattern
s
If you have written parsers before, you have probably implemented a wrapper around your raw input
with methods such as peek
, accept
, next()
etc...
We do this because it simplifies keeping track of our position and asserting things about the input.
The Bytes
struct does exactly that.
Here is contribed example of parsing a Set-Cookie
header value.
If you were actually doing this, the code would be a bit more structured (a state machine perhaps?), but you would still use Bytes
in a similar manner.
use std::str::from_utf8;
use bparse::{Bytes, oneof, noneof, at_least};
let cookie = " id=b839d87df;Domain=foo.com; HttpOnly;";
let mut bytes = Bytes::from(cookie);
let mut is_http_only = false;
let mut domain = None;
let mut name = "";
let mut value = "";
let until_semicolon = at_least(1, noneof(b";"));
let until_eql = at_least(1, noneof(b"="));
let optional_ws = at_least(0, oneof(b"\t "));
loop {
if bytes.eof() {
break;
}
let _ = bytes.parse(optional_ws);
if bytes.parse("Domain=").is_some() {
domain = bytes.parse(until_semicolon).map(|b| from_utf8(b).unwrap());
let _ = bytes.parse(";");
continue;
}
if bytes.parse("HttpOnly;").is_some() {
is_http_only = true;
continue;
}
if let Some(cookie_name) = bytes.parse(until_eql) {
let _ = bytes.parse("=");
name = from_utf8(cookie_name).unwrap();
let Some(cookie_value) = bytes.parse(until_semicolon) else {
panic!("missing cookie value");
};
value = from_utf8(cookie_value).unwrap();
let _ = bytes.parse(";");
continue;
}
}
assert!(is_http_only);
assert_eq!(domain, Some("foo.com"));
assert_eq!(name, "id");
assert_eq!(value, "b839d87df");