#serialization #serde-yaml #ros #config

bin+lib serde-saphyr

YAML deserializer for Serde, built on top of Saphyr, emphasizing panic-free parsing

8 releases

Uses new Rust 2024

new 0.0.8-alpha-pre Nov 9, 2025
0.0.7 Nov 4, 2025
0.0.6 Oct 16, 2025
0.0.3 Sep 30, 2025

#191 in Parser implementations

Download history 90/week @ 2025-09-21 287/week @ 2025-09-28 219/week @ 2025-10-05 234/week @ 2025-10-12 1726/week @ 2025-10-19 1725/week @ 2025-10-26 4787/week @ 2025-11-02

8,483 downloads per month
Used in 5 crates

MIT license

360KB
6.5K SLoC

serde-saphyr

panic-free GitHub Workflow Status crates.io crates.io crates.io docs.rs Fuzz & Audit

serde-saphyr is a strongly typed YAML deserializer built on saphyr-parser. It aims to be panic-free on malformed input and to avoid unsafe code in library code. The crate deserializes YAML directly into your Rust types without constructing an intermediate tree of “abstract values.” It is not a fork of the older serde-yaml and does not share any code with it (some tests are reused). It provides both serializer and deserializer.

Why this approach?

  • Light on resources: Having almost no intermediate data structures should result in more efficient parsing, especially if anchors are used only lightly.
  • Also simpler: No code to support intermediate Values of all kinds.
  • Type-driven parsing: YAML that doesn’t match the expected Rust types is rejected early.
  • Safer by construction: No dynamic “any” objects; common YAML-based code-execution exploits do not apply.

Benchmarking

In our benchmarking project, we tested the following crates:

Crate Version Merge Keys Nested Enums Duplicate key rejection Notes
serde-saphyr 0.0.4 ✅ Native ✅ Configurable No unsafe, no unsafe-libyaml
serde-yaml-bw 2.4.1 ✅ Native ✅ Configurable Slow due Saphyr doing budget check first upfront of libyaml
serde-yaml-ng 0.10.0 ⚠️ partial
serde-yaml 0.9.34 + deprecated ⚠️ partial Original, deprecated, repo archived
serde-norway 0.9 ⚠️ partial
serde-yml 0.0.12 ⚠️ partial Repo archived

Benchmarking was done with Criterion, giving the following results:

Relative median time vs baseline

As seen, serde-saphyr exceeds others by performance, even with budget check enabled.

Other features

  • Configurable budgets: Enforce input limits to mitigate resource exhaustion (e.g., deeply nested structures or very large arrays); see Budget.
  • Serializer supports emitting anchors (Rc, Arc, Weak) if they properly wrapped (see below).
  • serde_json::Value is supported when parsing without target structure defined.
  • robotic extensions to support YAML dialect common in robotics (see below).

Deserialization

Duplicate keys

Duplicate key handling is configurable. By default it’s an error; “first wins” and “last wins” strategies are available via Options. Duplicate key policy applies not just to strings but also to other types (when deserializing into map).

Unsupported features

  • Tagged enums (!!EnumName RED) are not supported. Use mapping-based enums (EnumName: RED) instead. This also allows you to define nested enums if needed, with tagged enums this is not possible by YAML standard.

Usage

Parse YAML into a Rust structure with proper error handling. The crate name on crates.io is serde-saphyr, and the import path is serde_saphyr.

use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct Config {
  name: String,
  enabled: bool,
  retries: i32,
}

fn main() {
let yaml_input = r#"
  name: "My Application"
  enabled: true
  retries: 5
...
"#;

    let config: Result<Config, _> = serde_saphyr::from_str(yaml_input);

    match config {
        Ok(parsed_config) => {
            println!("Parsed successfully: {:?}", parsed_config);
        }
        Err(e) => {
            eprintln!("Failed to parse YAML: {}", e);
        }
    }
}

Multiple documents

YAML streams can contain several documents separated by ---/... markers. When deserializing with serde_saphyr::from_multiple, you still need to supply the vector element type up front (Vec`). That does not lock you into a single shape: make the element an enum and each document will deserialize into the matching variant. This lets you mix different payloads in one stream while retaining strong typing on the Rust side.

use serde::Deserialize;

#[derive(Debug, Deserialize, PartialEq)]
enum Document {
    #[serde(rename = "person")]
    Person { name: String, age: u8 },
    #[serde(rename = "pet")]
    Pet { kind: String },
}

fn main() {
    let input = r#"---
 person:
   name: Alice
   age: 30
---
 pet:
  kind: cat
---
 person:
   name: Bob
   age: 25
"#;
    let docs = serde_saphyr::from_multiple(input).expect("valid YAML stream");
}

Nested enums

Externally tagged enums nest naturally in YAML as maps keyed by the variant name. This enables strict, expressive models (enums with associated data) instead of generic maps.

use serde::Deserialize;

#[derive(Deserialize)]
struct Move {
  by: f32,
  constraints: Vec<Constraint>,
}

#[derive(Deserialize)]
enum Constraint {
  StayWithin { x: f32, y: f32, r: f32 },
  MaxSpeed { v: f32 },
}

fn main() {
let yaml = r#"
- by: 10.0
  constraints:
    - StayWithin:
      x: 0.0
      y: 0.0
      r: 5.0
    - StayWithin:
      x: 4.0
      y: 0.0
      r: 5.0
    - MaxSpeed:
      v: 3.5
      "#;

  let robot_moves: Vec<Move> = serde_saphyr::from_str(yaml).unwrap();
  println!("Parsed {} moves", robot_moves.len());
  }

There are two variants of the deserialization functions: from_* and from_*_with_options. The latter takes Options to configure many aspects of parsing.

Composite keys

YAML supports complex (non-string) mapping keys. Rust maps can mirror this, allowing you to parse such structures directly.

use serde::{Deserialize};
use std::collections::HashMap;

#[derive(Debug, PartialEq, Eq, Hash, Deserialize)]
struct Point {
  x: i32,
  y: i32
}

#[derive(Debug, PartialEq, Deserialize)]
struct Transform {
    // Transform between locations
    map: HashMap<Point, Point>,
}

fn main() {
let yaml = r#"
map:
  {x: 1, y: 2}: {x: 3, y: 4}
  {x: 5, y: 6}: {x: 7, y: 8}
"#;
let transform: Transform = serde_saphyr::from_str(yaml).unwrap();
println!("{} entries", transform.map.len());
}

Booleans

By default, if the target field is boolean, serde-saphyr will attempt to interpret standard YAML 1.1 values as boolean (not just 'false' but also 'no', etc). If you do not want this (or you are parsing into a JSON Value where it is wrongly inferred), enclose the value in quotes or set strict_booleans to true in Options.

Deserializing into abstract JSON Value

If you must work with abstract types, you can also deserialize YAML into serde_json::Value. Serde will drive the process through deserialize_any because Value does not fix a Rust primitive type ahead of time. You lose strict type control by Rust struct data types.

Binary scalars

!!binary-tagged YAML values are base64-decoded when deserializing into Vec<u8> or String (reporting an error if it is not valid UTF-8)

use serde::Deserialize;

#[derive(Debug, Deserialize, PartialEq)]
struct Blob {
    data: Vec<u8>,
}

fn parse_blob() {
    let blob: Blob = serde_saphyr::from_str("data: !!binary aGVsbG8=").unwrap();
    assert_eq!(blob.data, b"hello");
}

Merge keys

serde-saphyr supports merge keys, which reduce redundancy and verbosity by specifying shared key-value pairs once and then reusing them across multiple mappings. Here is an example with merge keys (inherited properties):

use serde::Deserialize;

/// Configuration to parse into. Does not include "defaults"
#[derive(Debug, Deserialize, PartialEq)]
struct Config {
    development: Connection,
    production: Connection,
}

#[derive(Debug, Deserialize, PartialEq)]
struct Connection {
    adapter: String,
    host: String,
    database: String,
}

fn main() {
    let yaml_input = r#"
# Here we define "default configuration"  
defaults: &defaults
  adapter: postgres
  host: localhost

development:
  <<: *defaults
  database: dev_db

production:
  <<: *defaults
  database: prod_db
"#;

    // Deserialize YAML with anchors, aliases and merge keys into the Config struct
    let parsed: Config = serde_saphyr::from_str(yaml_input).expect("Failed to deserialize YAML");

    // Define expected Config structure explicitly
    let expected = Config {
        development: Connection {
            adapter: "postgres".into(),
            host: "localhost".into(),
            database: "dev_db".into(),
        },
        production: Connection {
            adapter: "postgres".into(),
            host: "localhost".into(),
            database: "prod_db".into(),
        },
    };

    // Assert parsed config matches expected
    assert_eq!(parsed, expected);
}

Merge keys are standard in YAML 1.1. Although YAML 1.2 no longer includes merge keys in its specification, it doesn't explicitly disallow them either, and many parsers implement this feature.

Rust types as schema

The target Rust types act as a schema. Knowing whether a field is a string or a boolean allows the parser to accept 1.2 as either a number or the string "1.2" depending on the target type, and to interpret common YAML boolean shorthands like y, on, n, or off appropriately. Similarly, 0x2A is a hexadecimal number when parsed into an integer field, and a string when parsed into String. Legacy octal format like 0052 can be turned on in Options but is off by default.

Pathological inputs & budgets

Fuzzing shows that certain adversarial inputs can make YAML parsers consume excessive time or memory, enabling denial-of-service scenarios. To counter this, serde-saphyr offers a fast, configurable pre-check via a Budget, available through Options. Defaults are conservative; tighten them when you know your input shape, or disable the budget if you only parse YAML you generate yourself. During reader-based deserialization, serde-saphyr does not buffer the entire payload; it parses incrementally, counting bytes and enforcing configured budgets. This design blocks denial-of-service attempts via excessively large inputs. When streaming from the reader through the iterator, other budget limits apply on a per-document basis, since such a reader may be expected to stream indefinitely. The total size of input is not limited in this case.

Serialization

use serde::Serialize;

#[derive(Serialize)]
struct User { name: String, active: bool }

let yaml = serde_saphyr::to_string(&User { name: "Ada".into(), active: true }).unwrap();
assert!(yaml.contains("name: Ada"));

Anchors (Rc/Arc/Weak)

Serde-saphyr can conceptually connect YAML anchors with Rust shared references (Rc, Weak and Arc). You need to use wrappers to activate this feature:

  • RcAnchor<T> and ArcAnchor<T> emit anchors like &a1 on first occurrence and may emit aliases *a1 later.
  • RcWeakAnchor<T> and ArcWeakAnchor<T> serialize a weak ref: if the strong pointer is gone, it becomes null.
     #[derive(Deserialize, Serialize)]
    struct Doc {
        a: RcAnchor<Node>,
        b: RcAnchor<Node>,
    }

    #[derive(Deserialize, Serialize)]
    struct Bigger {
        primary_a: RcAnchor<Node>,
        doc: Doc,
    }

    let the_a = RcAnchor::from(Rc::new(Node {
        name: "primary_a".to_string(),
    }));

    let data = Bigger {
        primary_a: the_a.clone(),
        doc: Doc {
            a: the_a.clone(),
            b: RcAnchor::from(Rc::new(Node {
                name: "the_b".to_string(),
            })),
        },
    };

    let serialized = serde_saphyr::to_string(&data)?;
    assert_eq!(serialized, String::from(
        indoc! {
            r#"primary_a: &a1
                  name: primary_a
                doc:
                  a: *a1
                  b: &a2
                    name: the_b
            "#}));

    let deserialized: Bigger = serde_saphyr::from_str(&serialized)?;

    assert_eq!(&deserialized.primary_a.name, &deserialized.doc.a.name);
    assert_eq!(&deserialized.doc.b.name, &data.doc.b.name);
    assert!(Rc::ptr_eq(&deserialized.primary_a.0, &deserialized.doc.a.0));

    Ok(())
}

When anchors are highly repetitive and also large, packing them into references can make YAML more human-readable.

Starting from 0.0.7, this library can also deserialize YAML into these anchor structures, this serialization is identity-preserving. A field or structure that is defined once and subsequently referenced will exist as a single instance in memory, with all anchor fields pointing to it. This is crucial when the topology of references itself constitutes important information to be transferred.

Robotics

The feature-gated "robotics" capability enables parsing of YAML extensions commonly used in robotics (ROS, ROS2, etc.) These extensions support conversion functions (deg, rad) and simple mathematical expressions such as deg(180), rad(pi), 1 + 2*(3 - 4/5), or rad(pi/2). This capability is gated behind the [robotics] feature and is not enabled by default. Additionally, angle_conversions must be set to true in the Options.

rad_tag: !radians 0.15 # value in radians, stays in radians
deg_tag: !degrees 180 # value in degrees, converts to radians
expr_complex: 1 + 2*(3 - 4/5) # simple expressions supported
func_deg: deg(180) # value in degrees, converts to radians
func_rad: rad(pi) # value in radians (stays in radians)
hh_mm_secs: -0:30:30.5 # Time
longitude: !radians 8:32:53.2 # Nautical, ETH Zürich Main Building (8°32′53.2″ E)
let options = Options {
    angle_conversions: true, // enable robotics angle parsing
    .. Options::default()
};

let v: RoboFloats = from_str_with_options(yaml, options).expect("parse robotics YAML");

Safety hardening with this feature enabled include (maximal expression depth, maximal number of digits, strict underscore placement and fraction parsing limits to precision-relevant digit).

Dependencies

~6.5MB
~184K SLoC