8 releases (4 breaking)
0.5.0 | Apr 28, 2019 |
---|---|
0.4.0 | Apr 14, 2019 |
0.3.3 | Apr 11, 2019 |
0.2.0 | Apr 6, 2019 |
0.1.0 | Apr 5, 2019 |
#1899 in Data structures
25 downloads per month
47KB
991 lines
HTML reader
Description
This library allows to read HTML strings and convert them to node tree. With the tree it is easier to process data that is stored as HTML/XML file. It is also possible to change the nodes and convert them back into HTML string.
Current main features
- Parse attributes with spaces into multiple strings
- Search for nodes that have particular attributes
- Change attributes
- Change tags name
- Edit node's children array
- Convert nodes back to HTML
- Choose between sharable and owned Nodes (with Arc or without correspondingly)
Examples
Load nodes from HTML
# use htmldom_read::Node;
let html = r#"
<div><p>Text</p></div>
"#;
// Load with default settings.
let nodes = Node::from_html(html, &Default::default()).unwrap().unwrap();
let first_node = nodes.children().get(0).unwrap();
// First node is <div>
assert_eq!("div", first_node.tag_name().unwrap());
let children = first_node.children();
// First child of <div> is <p>
let first_child = children.get(0).unwrap();
assert_eq!("p", first_child.tag_name().unwrap());
/// The child of <p> is Text
assert_eq!("Text", first_child.children().get(0).unwrap().text().unwrap());
Load node with text mixed with children
Text that is not mixed load inside the parent node and not as separate child.
# use htmldom_read::{Node, LoadSettings};
let html = r#"
<p>Text <sup>child</sup> more text</p>
"#;
let settings = LoadSettings::new().all_text_separately(false);
let from = Node::from_html(html, &settings).unwrap().unwrap();
let node = from.children().get(0).unwrap();
let children = node.children();
let first_text = children.get(0).unwrap();
assert_eq!("Text ", first_text.text().unwrap());
let sup = children.get(1).unwrap();
assert_eq!("child", sup.text().unwrap());
let last_text = children.get(2).unwrap();
assert_eq!(" more text", last_text.text().unwrap());
Dependencies
~5MB
~143K SLoC