18 breaking releases
0.19.0 | Dec 17, 2023 |
---|---|
0.17.0 | Dec 9, 2023 |
0.16.0 | Oct 7, 2023 |
#1694 in Parser implementations
213 downloads per month
48KB
788 lines
Parser Compose
DEPRECATED. See bparse
⚠️ Warning ☣️
Homemade, hand-rolled code ahead. Experimental. May not function as advertised.
What?
parser-compose
is a crate for writing parsers for arbitrary data formats. You
can use it to parse string slices and slice
s whose elements implement
Clone
.
Similar projects
How is this different?
parser-compose
does nothing for error handling or reporting. If the parser
suceeds you get a Some(..)
with the value. If the parser fails you get a
None
. Many parsing tasks do not require knowing why parsing fails, only
that it did. For example, when parsing an HTTP message header, you just want to
know if it is valid or not. This crate focuses on this use case, and in so
doing sheds all the ceremony involved in supporting parser error handling.
Examples
Parsing a IMF http date.
use parser_compose::{Parser,utf8_scalar};
struct ImfDate {
day_name: String,
day: u8,
month: String,
year: u16,
hour: u8,
minute: u8,
second: u8,
}
fn imfdate(input: &str) -> Option<(ImfDate, &str)> {
let digit = utf8_scalar(0x30..=0x39);
let twodigits = digit
.repeats(2)
.input()
.map(|s| u8::from_str_radix(s, 10).unwrap());
// day-name = %s"Mon" / %s"Tue" / %s"Wed"
// / %s"Thu" / %s"Fri" / %s"Sat" / %s"Sun"
let day_name = "Mon".or("Tue").or("Wed")
.or("Thu").or("Fri").or("Sat").or("Sun")
.map(str::to_string);
let day = twodigits;
// month = %s"Jan" / %s"Feb" / %s"Mar" / %s"Apr"
// / %s"May" / %s"Jun" / %s"Jul" / %s"Aug"
// / %s"Sep" / %s"Oct" / %s"Nov" / %s"Dec"
let month = "Jan".or("Feb").or("Mar").or("Apr")
.or("May").or("Jun").or("Jul").or("Aug")
.or("Sep").or("Oct").or("Nov").or("Dec")
.map(str::to_string);
let year = digit
.repeats(4)
.input()
.map(|s| u16::from_str_radix(s, 10).unwrap());
// date1 = day SP month SP year
let date1 = (day, " ", month, " ", year)
.map(|(d, _, m, _, y)| (d, m, y));
let gmt = "GMT";
let hour = twodigits;
let minute = twodigits;
let second = twodigits;
// time-of-day = hour ":" minute ":" second
let time_of_day = (hour, ":", minute, ":", second)
.map(|(h,_,m,_,s)| (h, m, s));
// IMF-fixdate = day-name "," SP date1 SP time-of-day SP GMT
(day_name, ", ", date1, " ", time_of_day, " ", gmt)
.map(|res| ImfDate {
day_name: res.0,
day: res.2.0,
month: res.2.1,
year: res.2.2,
hour: res.4.0,
minute: res.4.1,
second: res.4.2,
}).try_parse(input)
}
let input = "Sun, 06 Nov 1994 08:49:37 GMT";
let (date, rest) = imfdate(input).unwrap();
assert_eq!(date.day_name, "Sun");
assert_eq!(date.day, 6);
assert_eq!(date.month, "Nov");
assert_eq!(date.year, 1994);
assert_eq!(date.hour, 8);
assert_eq!(date.minute, 49);
assert_eq!(date.second, 37);
// missing comma after day name
assert!(
imfdate("Sun 06 Nov 1994 08:49:37 GMT").is_none(),
);
Validating an http quoted string:
use parser_compose::{Parser,byte};
/// Tries to parse a quoted string out of `input`.
/// If it succeeds, the returned tuple contains the quoted string (without
/// quotes) along with the rest of the input
fn quoted_string(input: &[u8]) -> Option<(&[u8], &[u8])> {
let htab = byte(b'\t');
let sp = byte(b' ');
let dquote = byte(b'"');
let obs_text = byte(0x80..=0xFF);
let vchar = byte(0x21..=0x7E);
// qdtext = HTAB / SP / %x21 / %x23-5B / %x5D-7E / obs-text
let qdtext = htab
.or(sp)
.or(byte(0x21))
.or(byte(0x23..=0x5B))
.or(byte(0x5D..=0x7E))
.or(obs_text);
// quoted-pair = "\" ( HTAB / SP / VCHAR / obs-text )
let quoted_pair = (
byte(b'\\'),
htab.or(sp).or(vchar).or(obs_text)
).input();
// quoted-string = DQUOTE *( qdtext / quoted-pair ) DQUOTE
(dquote, qdtext.or(quoted_pair).repeats(0..).input(), dquote)
.map(|(_, r, _)| r)
.try_parse(input)
}
// 3 quoted strings one after the other "1", "a" and "\""
let input = r##""1""a""\"""##.as_bytes();
let (first, rest) = quoted_string(input).unwrap();
assert_eq!(
b"1",
first
);
// Notice that because of its signature, `quoted_string` can be treated as a parser
let (remaining, _) = quoted_string.accumulate(2).try_parse(rest).unwrap();
assert_eq!(remaining.len(), 2);
assert_eq!(remaining[0], b"a");
assert_eq!(remaining[1], b"\\\"");
Thanks
This crate would not have been possible without:
- This post called You could have invented Parser Combinators, which brought the concept of parser combinators down from "academic sounding term, no thank you" to "wow, i can understand this"
- This guide to writing parser combinators in rust
- This
article
by the author of
pom
, which lays out the various approaches to writing parser combinators in rust.