#html-parser #html #parser #dom #html5 #lite

lithtml

A lightweight and fast HTML parser for Rust, designed to handle both full HTML documents and fragments efficiently

1 unstable release

0.6.0 Jan 11, 2025

#1957 in Parser implementations

Download history 107/week @ 2025-01-08

107 downloads per month

MIT license

51KB
591 lines

lithtml

A lightweight and fast HTML/XHTML parser for Rust, designed to handle both full HTML documents and fragments.

This parser uses Pest for parsing and is forked from html-parser.

Features

  • Parse html & xhtml (not xml processing instructions)
  • Parse html-documents
  • Parse html-fragments
  • Parse empty documents
  • Parse with the same api for both documents and fragments
  • Parse custom, non-standard, elements; <cat/>, <Cat/> and <C4-t/>
  • Removes comments
  • Removes dangling elements
  • Iterate over all nodes in the dom three

Examples

Parse html document

    use lithtml::Dom;

    fn main() {
        let html = r#"
            <!doctype html>
            <html lang="en">
                <head>
                    <meta charset="utf-8">
                    <title>Html parser</title>
                </head>
                <body>
                    <h1 id="a" class="b c">Hello world</h1>
                    </h1> <!-- comments & dangling elements are ignored -->
                </body>
            </html>"#;

        assert!(Dom::parse(html).is_ok());
    }

Parse html fragment

    use lithtml::Dom;

    fn main() {
        let html = "<div id=cat />";
        assert!(Dom::parse(html).is_ok());
    }

Dependencies

~2.4–3.5MB
~71K SLoC