#hyperlink #markdown #restructuredtext #markup #parser

parse-hyperlinks

A Nom parser library for hyperlinks with markup

40 releases

Uses new Rust 2024

new 0.29.0 May 15, 2025
0.27.2 Oct 19, 2023
0.23.4 Jul 30, 2022
0.23.3 Sep 26, 2021
0.11.0 Nov 30, 2020

#257 in Parser implementations

Download history 3904/week @ 2025-01-29 13599/week @ 2025-02-05 3037/week @ 2025-02-12 4613/week @ 2025-02-19 4230/week @ 2025-02-26 4217/week @ 2025-03-05 4583/week @ 2025-03-12 6107/week @ 2025-03-19 3311/week @ 2025-03-26 3575/week @ 2025-04-02 6634/week @ 2025-04-09 3476/week @ 2025-04-16 7076/week @ 2025-04-23 5112/week @ 2025-04-30 6513/week @ 2025-05-07 5367/week @ 2025-05-14

24,882 downloads per month
Used in 13 crates (10 directly)

MIT/Apache

250KB
4.5K SLoC

Parse hyperlinks

Parse-hyperlinks, a parser library written with Nom to recognize hyperlinks and link reference definitions in Markdown, reStructuredText, Asciidoc and HTML formatted text input.

Cargo Documentation License

The library implements the CommonMark Specification 0.30, reStructuredText Markup Specification (revision 8571, date 2020-10-28), the specifications in Asciidoctor User Manual, chapter 26 (date 2020-12-03) and HTML 5.2: section 4.5.

To illustrate the usage and the API of the library, Parse-hyperlinks comes with a simple command line application: Atext2html

  1. All input is UTF-8 encoded.

  2. The input text is formatted according to one of the markup language specification above. As Parse-Hyperlinks ignores most of the markup, it relies solely on the hyperlink specification of the respective markup language.

Additional input contract for HTML documents

  1. The characters &<>" in absolute URLs in HTML documents must be HTML- escape-encoded: these characters are replaced with their entity names, e.g. &amp;, &lt;, &gt; and &quote.

  2. Relative URLs (local links) in UTF-8 encoded HTML document, do not need to be HTML-escape encoded. I recommend not to do so.

  3. Relative URLs (local links) must not start with a scheme, e.g. http:.

  4. In addition to HTML-escape-encoding discussed above, URLs can be percent encoded as well, e.g. %20 or %26. When both encodings appear in an HTML document, the HTML escape decoding is applied first, then the percent decoding. For example, the encoded string Ü ber%26amp;Über &amp is decoded to Ü ber&amp;Über &. In general, avoid percent encoding. URLs in UTF-8 HTML documents can always be expressed without percent encoding.

The following section explains how Parse-Hyperlinks meets the above General HTML requirements. It refers to the items in the list above.

  1. Only functions in the renderer module, HTML-escape encode absolute URLs in HTML documents: The characters &<>" are replaced with their HTML escape entity names, e.g.: &amp;, &lt;, &gt; and &quote. All other parsers and iterators do not apply HTML-escape-encoding to absolute URLs.

  2. No function, parser or iterator in Parse-Hyperlinks applies escape-encoding to relative URLs.

  3. This property is not enforced by Parse-Hyperlinks. Compliance depend on the parser's input.

  4. Percent-encoding in Parse-Hyperlinks:

    • No percent encoding at all is applied in Parse-Hyperlinks.

    • Percent decoding: In some cases, when the markup language specification requires the input URL to be percent encoded, the concerned consuming parser decodes the percent encoding automatically. Percent decoding is URL's is performed implicitly when consuming:

      • Markdown autolinks when parsed by: md_text2dest(),
      • Asciidoc URLs when parsed by: adoc_label2dest() or adoc_text2dest,
      • WikiText URLs when parsed by: wikitext_text2dest()
    • Rendered autolink markup:

      1. The same Markdown input may result in different HTML according to the renderer. For example: pulldown-cmark renders the Markdown autolink <http://getreu.net/Ü%20&> into <a href="http://getreu.net/%C3%9C%20&amp;">http://getreu.net/Ü%20&amp;</a>.

        • Observation 1: the rendition contains percent and HTML escape codes.
        • Obesrvation 2: the link destination (http://getreu.net/%C3%9C%20&amp;) and the link text (http://getreu.net/Ü%20&amp;) are slightly different, which has to be taken into account when detecting autolinks based on the HTML rendition.
      2. The Parse-Hyperlinks Markdown renderer gives for the same input <http://getreu.net/Ü%20&> a slightly different result: <a href="http://getreu.net/Ü%20&amp;">http://getreu.net/Ü &amp;</a>. Explanation: first the parser md_text2dest() (percent) decodes the URL to http://getreu.net/Ü & and the renderer function in the module renderer (HTML-escape) encodes the result into http://getreu.net/Ü%20&amp;

Dependencies

~1.1–1.7MB
~36K SLoC