#pem #decode #parser #iterator #crypto

no-std pem-iterator

Iterate over PEM-encoded data

2 unstable releases

Uses old Rust 2015

0.2.0 Nov 26, 2017
0.1.0 Nov 13, 2017

#5 in #pem


1.5K SLoC

PEM Iterator

Iterate over PEM-encoded data.


  • Enables decoding PEM formatted data via iterators.
  • Fast. Current benchmarks put it at about 2x-4x faster than pem crate.
  • No dependencies, no unsafe, no dynamic allocation, only requires core.
  • Highly customizable encapsulation boundary parsing.
  • Resilient parsing. Errors generated by the underlying stream don't lose state.



pem-iterator = "0.2"

Crate root:

extern crate pem_iterator;


extern crate pem_iterator;

use pem_iterator::boundary::{BoundaryType, BoundaryParser, LabelMatcher};
use pem_iterator::body::Single;

const SAMPLE: &'static str = "-----BEGIN RSA PRIVATE KEY-----

let mut input = SAMPLE.chars().enumerate();

let mut label_buf = String::new();
    let mut parser = BoundaryParser::from_chars(BoundaryType::Begin, &mut input, &mut label_buf);
    assert_eq!(parser.next(), None);
    assert_eq!(parser.complete(), Ok(()));
println!("PEM label: {}", label_buf);

// Parse the body
let data: Result<Vec<u8>, _> = Single::from_chars(&mut input).collect();
let data = data.unwrap();

// Verify the end boundary has the same label as the begin boundary
    let mut parser = BoundaryParser::from_chars(BoundaryType::End, &mut input, LabelMatcher(label_buf.chars()));
    assert_eq!(parser.next(), None);
    assert_eq!(parser.complete(), Ok(()));

println!("data: {:?}", data);

BoundaryParser and Label

The first task in parsing a PEM formatted data is parsing the BEGIN boundary. Enter BoundaryParser. This iterator type takes three parameters to construct:

  • An enum value for BEGIN vs END.
  • The stream to get characters from.
  • An object to deal with the label.

That third parameter holds a lot of power. Basically as the parser encounters the label, it will notify this parameter of each character via the Label trait. This enables a bunch of different behaviors, such as:

  • Accumulating the characters into a buffer (e.g. &mut String)
  • Matching against known characters. (e.g. LabelMatcher("CERTIFICATE".chars()))
  • Discarding the characters completely (e.g. DiscardLabel)

In addition to a simple Mismatch error, this label processing also has the option to return custom, complex errors. Enabling significant versatility and expandability.

Since parsing the BEGIN label is totally separate from parsing END, one can mix and match strategies to customize the level of strictness (e.g. BEGIN and END can have different labels).

Chunked vs Single

For parsing the body this crate provides 2 iterators, Chunked and Single. The basic difference is Chunked emits 3 bytes of output at a time (corresponding to 4 characters of input), while Single emits only 1 byte at a time.

There may be some performance differences between the two, but presently they seem nearly identical. Originally there was more of a distinction between the two and a trade-off in performance vs functionality, but at this point, the difference is largely an ergonomic one.

Resilient parsing

The major types of this crate (BoundaryParser, Chunked, and Single), are all iterators. It's obvious why the body parsers are iterators: they need to iterate over the bytes of output. But why is BoundaryParser?

Basically, it makes parsing more resilient. If the underlying stream emits an errors, it can be forwarded to the caller and dealt with without losing parsing state. Is this useful? Probably not. Most of the time you'd just want to fail if the stream errors. But it is kind of neat.

No runtime deps