#xml #parser #tokenizer

xmlparser

Pull-based, zero-allocation XML parser

14 releases (8 breaking)

0.9.0 Feb 27, 2019
0.8.0 Dec 13, 2018
0.7.0 Oct 29, 2018
0.5.0 Jun 14, 2018
0.1.0 Dec 15, 2017

#14 in Parser implementations

Download history 323/week @ 2019-01-23 270/week @ 2019-01-30 397/week @ 2019-02-06 433/week @ 2019-02-13 431/week @ 2019-02-20 392/week @ 2019-02-27 862/week @ 2019-03-06 1094/week @ 2019-03-13 811/week @ 2019-03-20 1302/week @ 2019-03-27 1863/week @ 2019-04-03 2146/week @ 2019-04-10 1780/week @ 2019-04-17 1402/week @ 2019-04-24 1501/week @ 2019-05-01

6,911 downloads per month
Used in 21 crates (3 directly)

MIT/Apache

66KB
1.5K SLoC

xmlparser

Build Status Crates.io Documentation

xmlparser is a low-level, pull-based, zero-allocation XML 1.0 parser.

Example

for token in xmlparser::Tokenizer::from("<tagname name='value'/>") {
    println!("{:?}", token);
}

Why a new library

The main idea of this library is to provide a fast, low-level and complete XML parser.

Unlike other XML parsers, this one can return tokens not with &str/&[u8] data, but with StrSpan objects, which contain a position of the data in the original document. Which can be very useful if you want to post-process tokens even more and want to return errors with a meaningful position.

So, this is basically an XML parser framework that can be used to write parsers for XML-based formats, like SVG and to construct a DOM.

At the time of writing the only option was quick-xml (v0.10), which does not support DTD and token positions.

If you are looking for a more high-level solution - checkout roxmltree.

Benefits

  • All tokens contain StrSpan objects which contain a position of the data in the original document.
  • Good error processing. All error types contain position (line:column) where it occurred.
  • No heap allocations.
  • No dependencies.
  • Tiny. ~1500 LOC and ~40KiB in the release build according to the cargo-bloat.

Limitations

  • Currently, only ENTITY objects are parsed from the DOCTYPE. Other ignored.
  • No tree structure validation. So an XML like <root><child></root></child> will be parsed without errors. You should check for this manually. On the other hand <a/><a/> will lead to an error.
  • Duplicated attributes is not an error. So an XML like <item a="v1" a="v2"/> will be parsed without errors. You should check for this manually.
  • UTF-8 only.

Safety

  • The library must not panic. Any panic considered as a critical bug and should be reported.
  • The library forbids the unsafe code.

Dependency

Rust >= 1.18

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

No runtime deps