#parser #combinator

parser-compose

A library for writing and composing parsers for arbitrary file or data formats

18 breaking releases

0.19.0 Dec 17, 2023
0.17.0 Dec 9, 2023
0.16.0 Oct 7, 2023

#39 in Parser tooling

Download history 2/week @ 2023-11-13 49/week @ 2023-11-20 35/week @ 2023-11-27 86/week @ 2023-12-04 32/week @ 2023-12-11 27/week @ 2023-12-18 39/week @ 2023-12-25 26/week @ 2024-01-01 1/week @ 2024-01-08 38/week @ 2024-01-29 1/week @ 2024-02-05 83/week @ 2024-02-12 753/week @ 2024-02-19

875 downloads per month

MIT license

48KB
788 lines

Parser Compose

DEPRECATED. See bparse

⚠️ Warning ☣️
Homemade, hand-rolled code ahead. Experimental. May not function as advertised.

Documentation

Examples

What?

parser-compose is a crate for writing parsers for arbitrary data formats. You can use it to parse string slices and slices whose elements implement Clone.

Similar projects

How is this different?

parser-compose does nothing for error handling or reporting. If the parser suceeds you get a Some(..) with the value. If the parser fails you get a None. Many parsing tasks do not require knowing why parsing fails, only that it did. For example, when parsing an HTTP message header, you just want to know if it is valid or not. This crate focuses on this use case, and in so doing sheds all the ceremony involved in supporting parser error handling.

Examples

Parsing a IMF http date.

use parser_compose::{Parser,utf8_scalar};

struct ImfDate {
  day_name: String,
  day: u8,
  month: String,
  year: u16,
  hour: u8,
  minute: u8,
  second: u8,
}

fn imfdate(input: &str) -> Option<(ImfDate, &str)> {
  let digit = utf8_scalar(0x30..=0x39);
  let twodigits = digit
    .repeats(2)
    .input()
    .map(|s| u8::from_str_radix(s, 10).unwrap());


  // day-name     = %s"Mon" / %s"Tue" / %s"Wed"
  //              / %s"Thu" / %s"Fri" / %s"Sat" / %s"Sun"
  let day_name = "Mon".or("Tue").or("Wed")
    .or("Thu").or("Fri").or("Sat").or("Sun")
    .map(str::to_string);

  let day = twodigits;

  // month        = %s"Jan" / %s"Feb" / %s"Mar" / %s"Apr"
  //               / %s"May" / %s"Jun" / %s"Jul" / %s"Aug"
  //               / %s"Sep" / %s"Oct" / %s"Nov" / %s"Dec"
  let month = "Jan".or("Feb").or("Mar").or("Apr")
    .or("May").or("Jun").or("Jul").or("Aug")
    .or("Sep").or("Oct").or("Nov").or("Dec")
    .map(str::to_string);

  let year = digit
    .repeats(4)
    .input()
    .map(|s| u16::from_str_radix(s, 10).unwrap());

  // date1        = day SP month SP year
  let date1 = (day, " ", month, " ", year)
    .map(|(d, _, m, _, y)| (d, m, y));

  let gmt = "GMT";

  let hour = twodigits;
  let minute = twodigits;
  let second = twodigits;

  // time-of-day  = hour ":" minute ":" second
  let time_of_day = (hour, ":", minute, ":", second)
    .map(|(h,_,m,_,s)| (h, m, s));

  // IMF-fixdate  = day-name "," SP date1 SP time-of-day SP GMT
  (day_name, ", ", date1, " ", time_of_day, " ", gmt)
    .map(|res| ImfDate {
      day_name: res.0,
      day: res.2.0,
      month: res.2.1,
      year: res.2.2,
      hour: res.4.0,
      minute: res.4.1,
      second: res.4.2,
    }).try_parse(input)
}

let input = "Sun, 06 Nov 1994 08:49:37 GMT";

let (date, rest) = imfdate(input).unwrap();

assert_eq!(date.day_name, "Sun");
assert_eq!(date.day, 6);
assert_eq!(date.month, "Nov");
assert_eq!(date.year, 1994);
assert_eq!(date.hour, 8);
assert_eq!(date.minute, 49);
assert_eq!(date.second, 37);

// missing comma after day name
assert!(
  imfdate("Sun 06 Nov 1994 08:49:37 GMT").is_none(),
);

Validating an http quoted string:

use parser_compose::{Parser,byte};

/// Tries to parse a quoted string out of `input`.
/// If it succeeds, the returned tuple contains the quoted string (without
/// quotes) along with the rest of the input
fn quoted_string(input: &[u8]) -> Option<(&[u8], &[u8])> {
  let htab = byte(b'\t');
  let sp = byte(b' ');
  let dquote = byte(b'"');
  let obs_text = byte(0x80..=0xFF); 
  let vchar = byte(0x21..=0x7E);

  // qdtext = HTAB / SP / %x21 / %x23-5B / %x5D-7E / obs-text
  let qdtext = htab
    .or(sp)
    .or(byte(0x21))
    .or(byte(0x23..=0x5B))
    .or(byte(0x5D..=0x7E))
    .or(obs_text);

  // quoted-pair = "\" ( HTAB / SP / VCHAR / obs-text )
  let quoted_pair = (
    byte(b'\\'),
    htab.or(sp).or(vchar).or(obs_text)
  ).input();


  // quoted-string = DQUOTE *( qdtext / quoted-pair ) DQUOTE
  (dquote, qdtext.or(quoted_pair).repeats(0..).input(), dquote)
    .map(|(_, r, _)| r)
    .try_parse(input)
}

// 3 quoted strings one after the other "1", "a" and "\""
let input = r##""1""a""\"""##.as_bytes(); 

let (first, rest) = quoted_string(input).unwrap();

assert_eq!(
  b"1",
  first
);

// Notice that because of its signature, `quoted_string` can be treated as a parser
let (remaining, _) = quoted_string.accumulate(2).try_parse(rest).unwrap();

assert_eq!(remaining.len(), 2);
assert_eq!(remaining[0], b"a");
assert_eq!(remaining[1], b"\\\"");

Thanks

This crate would not have been possible without:

No runtime deps