2 releases

0.1.1 Jun 27, 2023
0.1.0 Jun 27, 2023

#2035 in Parser implementations

Download history 43/week @ 2023-12-17 13/week @ 2023-12-31 31/week @ 2024-01-07 39/week @ 2024-01-14 13/week @ 2024-01-21 7/week @ 2024-01-28 28/week @ 2024-02-04 14/week @ 2024-02-11 45/week @ 2024-02-18 124/week @ 2024-02-25 44/week @ 2024-03-03 64/week @ 2024-03-10 119/week @ 2024-03-17 13/week @ 2024-03-24 82/week @ 2024-03-31

282 downloads per month
Used in 3 crates (2 directly)

MIT/Apache

72KB
1.5K SLoC

htmlparser

Build Status Crates.io Documentation Rust 1.31+

htmlparser is a low-level, pull-based, zero-allocation HTML parser.

Example

for token in htmlparser::Tokenizer::from("<tagname name='value'/>") {
    println!("{:?}", token);
}

Why a new library?

This library is a copy of xmlparser with some adjustments to parse html.

Benefits

  • All tokens contain StrSpan structs which represent the position of the substring in the original document.
  • Good error processing. All error types contain the position (line:column) where it occurred.
  • No heap allocations.
  • No dependencies.
  • Tiny. ~1400 LOC and ~30KiB in the release build according to cargo-bloat.
  • Supports no_std builds. To use without the standard library, disable the default features.

Limitations

  • Currently, only ENTITY objects are parsed from the DOCTYPE. All others are ignored.
  • No tree structure validation. So an XML like <root><child></root></child> or a string without root element will be parsed without errors. You should check for this manually. On the other hand <a/><a/> will lead to an error.
  • Duplicated attributes is not an error. So XML like <item a="v1" a="v2"/> will be parsed without errors. You should check for this manually.
  • UTF-8 only.

Safety

  • The library must not panic. Any panic is considered a critical bug and should be reported.
  • The library forbids unsafe code.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

No runtime deps

Features