#node-tree #node #html #html-parser #tree #reader #parser

htmldom_read

HTML reader that parses the code into easy-use tree

8 releases (4 breaking)

0.5.0 Apr 28, 2019
0.4.0 Apr 14, 2019
0.3.3 Apr 11, 2019
0.2.0 Apr 6, 2019
0.1.0 Apr 5, 2019

#2205 in Data structures

MIT license

47KB
991 lines

HTML reader

Description

This library allows to read HTML strings and convert them to node tree. With the tree it is easier to process data that is stored as HTML/XML file. It is also possible to change the nodes and convert them back into HTML string.

Current main features

  • Parse attributes with spaces into multiple strings
  • Search for nodes that have particular attributes
  • Change attributes
  • Change tags name
  • Edit node's children array
  • Convert nodes back to HTML
  • Choose between sharable and owned Nodes (with Arc or without correspondingly)

Examples

Load nodes from HTML

# use htmldom_read::Node;
let html = r#"
    <div><p>Text</p></div>
"#;
// Load with default settings.
let nodes = Node::from_html(html, &Default::default()).unwrap().unwrap();
let first_node = nodes.children().get(0).unwrap();
// First node is <div>
assert_eq!("div", first_node.tag_name().unwrap());

let children = first_node.children();

// First child of <div> is <p>
let first_child = children.get(0).unwrap();
assert_eq!("p", first_child.tag_name().unwrap());
/// The child of <p> is Text
assert_eq!("Text", first_child.children().get(0).unwrap().text().unwrap());

Load node with text mixed with children

Text that is not mixed load inside the parent node and not as separate child.

# use htmldom_read::{Node, LoadSettings};
let html = r#"
    <p>Text <sup>child</sup> more text</p>
"#;
let settings = LoadSettings::new().all_text_separately(false);

let from = Node::from_html(html, &settings).unwrap().unwrap();
let node = from.children().get(0).unwrap();
let children = node.children();

let first_text = children.get(0).unwrap();
assert_eq!("Text ", first_text.text().unwrap());

let sup = children.get(1).unwrap();
assert_eq!("child", sup.text().unwrap());

let last_text = children.get(2).unwrap();
assert_eq!(" more text", last_text.text().unwrap());

Dependencies

~5MB
~143K SLoC