1 unstable release

0.8.19 Feb 13, 2024

#1562 in Parser implementations

MIT license

255KB
4.5K SLoC

xml-no-std, an xml-rs fork for no_std

crates.io docs

Documentation

xml-no-std is a no_std fork of the popular XML library xml-rs for the Rust programming language. The crate sacrifices streaming capabilities and performance for no_std compliance (alloc is still needed).

All credit goes to netvl and kornelski. Thank you for the great work 💚

Motivation

xml-no-std was created in order to support XML encoding rules for the librasn ASN.1 framework. From the various encoding rules for ASN.1, XML encoding rules are usually not chosen for performance-critical use cases. Therefore, the performance losses are tolerable.

Trade-offs

In order to be compliant with no_std environments, xml-no-std operates on Iterator<Item = &u8> for reading and alloc::string::String for writing instead of std::io::Read and std::io::Write. Stream reading is therefore not supported.

As far as performance is concerned, the changes xml-no-std makes hit hard when XML documents with many attributes in its elements are read. xml-no-std uses a alloc::collections::BTreeSet for storing XML Attributes, which is suboptimal for elements with many attributes. There's definitely room for improvement here, so contributions are very welcome.

Some ballpark figures from my own dev machine:

Bench xml-rs xml-no-std
read 43,255 ns/iter (+/- 1,498) 57,263 ns/iter (+/- 1,121)
read_lots_attr 426,440 ns/iter (+/- 3,932) 6,122,947 ns/iter (+/- 609,079)
write 7,405 ns/iter (+/- 31) 17,303 ns/iter (+/- 134)

Building and using

xml-no-std uses Cargo, so add it with cargo add xml-no-std or modify Cargo.toml:

[dependencies]
xml-no-std = "0.8.16"

The package exposes a single crate called xml-no-std.

Reading XML documents

xml::reader::EventReader requires an Iterator over &u8 items to read from.

EventReader implements IntoIterator trait, so you can use it in a for loop directly:

use std::fs::File;
use std::io::BufReader;

use xml_no_std::reader::{EventReader, XmlEvent};

fn main() -> std::io::Result<()> {
    let mut input = String::new();
    let file = File::open("file.xml")?.read_to_string(&mut input);

    let parser = EventReader::new(input.as_bytes().iter());
    let mut depth = 0;
    for e in parser {
        match e {
            Ok(XmlEvent::StartElement { name, .. }) => {
                println!("{:spaces$}+{name}", "", spaces = depth * 2);
                depth += 1;
            }
            Ok(XmlEvent::EndElement { name }) => {
                depth -= 1;
                println!("{:spaces$}-{name}", "", spaces = depth * 2);
            }
            Err(e) => {
                eprintln!("Error: {e}");
                break;
            }
            // There's more: https://docs.rs/xml-rs/latest/xml/reader/enum.XmlEvent.html
            _ => {}
        }
    }

    Ok(())
}

Document parsing can end normally or with an error. Regardless of exact cause, the parsing process will be stopped, and the iterator will terminate normally.

You can also have finer control over when to pull the next event from the parser using its own next() method:

match parser.next() {
    ...
}

Upon the end of the document or an error, the parser will remember the last event and will always return it in the result of next() call afterwards. If iterator is used, then it will yield error or end-of-document event once and will produce None afterwards.

It is also possible to tweak parsing process a little using xml::reader::ParserConfig structure. See its documentation for more information and examples.

Parsing untrusted inputs

The parser is written in safe Rust subset, so by Rust's guarantees the worst that it can do is to cause a panic. You can use ParserConfig to set limits on maximum lenghts of names, attributes, text, entities, etc.

Writing XML documents

xml-rs also provides a streaming writer much like StAX event writer. With it you can write an XML document to any Write implementor.

use std::io;
use xml::writer::{EmitterConfig, XmlEvent};

/// A simple demo syntax where "+foo" makes `<foo>`, "-foo" makes `</foo>`
fn make_event_from_line(line: &str) -> XmlEvent {
    let line = line.trim();
    if let Some(name) = line.strip_prefix("+") {
        XmlEvent::start_element(name).into()
    } else if line.starts_with("-") {
        XmlEvent::end_element().into()
    } else {
        XmlEvent::characters(line).into()
    }
}

fn main() -> io::Result<()> {
    let input = io::stdin();
    let out = io::stdout();
    let mut writer = EmitterConfig::new()
        .perform_indent(true)
        .create_writer();

    let mut line = String::new();
    loop {
        line.clear();
        let bytes_read = input.read_line(&mut line)?;
        if bytes_read == 0 {
            break; // EOF
        }

        let event = make_event_from_line(&line);
        if let Err(e) = writer.write(event) {
            panic!("Write error: {e}")
        }
    }
    out.write_all(writer.into_inner().as_bytes())
}

The code example above also demonstrates how to create a writer out of its configuration. Similar thing also works with EventReader.

The library provides an XML event building DSL which helps to construct complex events, e.g. ones having namespace definitions. Some examples:

// <a:hello a:param="value" xmlns:a="urn:some:document">
XmlEvent::start_element("a:hello").attr("a:param", "value").ns("a", "urn:some:document")

// <hello b:config="name" xmlns="urn:default:uri">
XmlEvent::start_element("hello").attr("b:config", "value").default_ns("urn:defaul:uri")

// <![CDATA[some unescaped text]]>
XmlEvent::cdata("some unescaped text")

Of course, one can create XmlEvent enum variants directly instead of using the builder DSL. There are more examples in xml::writer::XmlEvent documentation.

The writer has multiple configuration options; see EmitterConfig documentation for more information.

Bug reports

Please report issues concerning core XML reading and writing at: https://github.com/kornelski/xml-rs/issues. Please report issues concerning the no-std fork at: https://github.com/6d7a/xml-no-std/issues.

No runtime deps