#html-parser #nom #structure #convert #string #tags #line

nom_html_parser

A parser to convert HTML string to HTML tree structure written with Nom

2 releases

0.1.1 Oct 8, 2020
0.1.0 May 3, 2020

#45 in #nom


Used in wasm-component

GPL-3.0 license

35KB
973 lines

Nom HTML Parser

This project is an Alpha HTML parser created with nom.

The goal of this crate is to provide a performant runtime HTML parsing library to convert HTML String to HTML node structure. The library is actually rather fragile.

Rules to follow:

  • Always close your HTML delimiter on a new line. For the moment, self-closing tag and 1 line html is not valid.
// good
<div>
</div>

// bad
<div></div>
<input />
  • Always indent the nodes relatively to there nesting level.
// good
<div>
  <div>
  </div>
</div>

// bad
<div>
<div>
</div>
</div>
  • Attributes have to be written in backticks.
// good
<div class=`test`>
</div>

// bad
<div class="test">
</div>
  • Text nodes are final, they assume that all the content after the start of the node is Text and not HTML nodes. in example:
<div>
  {{test}}
  {{/test}}
  <p>
    Test
  </p>
</div>

This will result in a Text node containing all the div content. The have the structure like this you have to wrap your text node in an HTML element:

<div>
  <span>
    {{test}}
    {{/test}}
  </span>
  <p>
    Test
  </p>
</div>

These limitations are temporary and will be fixed, but for the moment the library is usable and i will continue to improve it further. Contributions welcome :)

Dependencies

~1MB
~15K SLoC