12 breaking releases
Uses new Rust 2024
| 0.13.0 | Oct 26, 2025 |
|---|---|
| 0.11.0 | Sep 21, 2025 |
#942 in Web programming
518 downloads per month
280KB
7K
SLoC
htmlite
An HTML manipulation toolkit
htmlite is lightweight html toolkit for parsing, manipulating and generating HTML.
Examples
Parsing a fragment of html
use htmlite::NodeArena;
let arena = NodeArena::new();
htmlite::parse(&arena, "<h1>Hello, <i>world!</i></h1>").unwrap();
Selecting elements
use htmlite::{NodeArena, Node};
let html = r#"
<ul>
<li>Foo</li>
<li>Bar</li>
<li>Baz</li>
</ul>
"#;
let arena = NodeArena::new();
let root = htmlite::parse(&arena, html).unwrap();
for element in root.descendants().select("li") {
assert_eq!(&*element.name(), "li");
}
Accessing element attributes
use htmlite::{NodeArena, Node};
let arena = NodeArena::new();
let root = htmlite::parse(&arena, r#"<input name="foo" value="bar" readonly>"#).unwrap();
let element = root.descendants().select(r#"input[name="foo"]"#).next().unwrap();
assert_eq!(element.attr("value").as_deref(), Some("bar"));
assert_eq!(element.attr("readonly").as_deref(), Some(""));
Serializing HTML and inner HTML
use htmlite::{NodeArena};
let arena = NodeArena::new();
let root = htmlite::parse(&arena, "<h1>Hello, <i>world!</i></h1>").unwrap();
let h1 = root.descendants().select("h1").next().unwrap();
assert_eq!(h1.html(), "<h1>Hello, <i>world!</i></h1>");
assert_eq!(h1.inner_html(), "Hello, <i>world!</i>");
Manipulating the DOM
use htmlite::{NodeArena};
let html = "<html><body>hello<p class=\"hello\">REMOVE ME</p></body></html>";
let arena = NodeArena::new();
let root = htmlite::parse(&arena, html).unwrap();
for el in root.descendants().select(".hello") {
el.detach();
}
assert_eq!(root.html(), "<html><body>hello</body></html>")
Generating HTML
use htmlite::{NodeArena, html};
let h = NodeArena::new();
let form = html!(
h,
(form
["method" => "POST"]
(input ["value" => "hello", "type" => "text"])
(button (text "Submit"))
)
);
assert_eq!(form.html(), r#"<form method="POST"><input value="hello" type="text"><button>Submit</button></form>"#);
When should you use this?
This is not a "browser-grade" HTML parser, but it is close!
Specifically, the tokenizer is spec compliant and passes all the html5lib tokenizer tests.
So htmlite will accept any valid HTML "construct" like numeric & named character references and void elements.
However, the tree-builder does not follow the spec. This was done on purpose. A spec compliant tree-builder may restructure your markup for multitude of reasons: badly nested tags, child elements that don't conform to the content model of their parent, missing end tags etc ... The tree-builder in this library takes a simpler approach: it will parse any well-balanced HTML and output a tree that corresponds to that markup, exactly as written.
So this library will work well when you are parsing the output of HTML-generating tools like SSGs or markdown parser. Tools like these don't forget to add end tags :)
On the other hand, parsing random web content is more of a gamble.
For example, many sites rely on the fact that you do not need to close your <p> tags.
This library will fail on such markup.
TLDR; If your HTML looks like well-formed XML if you squint, this library's HTML parser is for you.
Adjacent crates
scraper: An inspiration for this crate. Uses html5ever. You get browser-grade html parsing with a browser-grade dependency tree.
kuchiki: As far as I understand this was the predecessor to scraper. Same thing about html5ever.
tl: A bit too lenient, while also failing on valid html. Additionally it does some weird error recovery that I did not want.
html5gum: Only tokenizes. I could have used this instead of writing my own tokenizer ... but where is the fun in that.
lol-html: Very odd API. A bit too dependency heavy for my liking. Different use case
Thank you
This crate would not be possible without SimonSapin's rust-forest experiment. The combination of using an Arena allocator and Cell-wrapped references is at the root of why this API is as ergonomic as it is. Brilliant design. Thank you for you work!
Dependencies
~2MB
~28K SLoC