10 releases (5 breaking)

0.6.0 Oct 2, 2023
0.5.0 Sep 23, 2022
0.4.1 Jan 4, 2021
0.4.0 Dec 2, 2020
0.1.0 Jun 1, 2018

#262 in Algorithms

Download history 389/week @ 2023-10-24 461/week @ 2023-10-31 509/week @ 2023-11-07 361/week @ 2023-11-14 662/week @ 2023-11-21 388/week @ 2023-11-28 320/week @ 2023-12-05 386/week @ 2023-12-12 417/week @ 2023-12-19 342/week @ 2023-12-26 393/week @ 2024-01-02 347/week @ 2024-01-09 296/week @ 2024-01-16 508/week @ 2024-01-23 388/week @ 2024-01-30 195/week @ 2024-02-06

1,416 downloads per month

GPL-3.0-or-later

66KB
1K SLoC

HTML2Pango

Small rust lib to parse simple html tags and convert to pango. This lib also converts raw links to http links and sanitizes the message to avoid not wanted tags.

The current state of the library is Alpha as it's a proof of concept for mapping a subset of html to pango markup.

This code was inside Fractal project to parse matrix.org messages that contains links and make that drawable in a GtkLabel. We decided to move to this repo to be able to extend and use in other projects like Hammond.


lib.rs:

Library for sanitizing and converting HTML strings to something that Pango can render.

This library contains several functions to (pre)process text to Pango Markup. What to use and when depends on the type of input and the desired result. This can range from just escaping to converting and sanitizing. See the examples below for what is available based on the input type.

The functions below convert strings to strings. If your input can contain several block elements such as headings, lists, code or quote blocks, see the block module to convert an input string into a list of these blocks.

Markdown/body HTML

To handle more HTML, use the markup_html function. This function supports HTML body markup such as HTML resulting from a Markdown-to-HTML conversion. It tries to convert the input to Pango Markup such that rendering by Pango will make it similar like what a browser would. This involves adding newlines for paragraphs and lists, converting font styles, etc.

let m = markup_html("<body>this is some <font color=\"#ff0000\">red text</font>!</body>").unwrap();
assert_eq!(m, "this is some <span foreground=\"#ff0000\">red text</span>!");

let m = markup_html("<body>a nice <a href=\"https://gnome.org\">link</a>").unwrap();
assert_eq!(m, "a nice <a href=\"https://gnome.org\">link</a>");

let m = markup_html("<body>some items: <ul><li>first</li><li>second</li></ul><body").unwrap();
assert_eq!(m, "some items: \n • first\n • second\n");

Escaping

To just escape any HTML reserved characters, use html_escape:

let s = html_escape("this is a <tag> & this is \"quoted text\"");
assert_eq!(s, "this is a &lt;tag&gt; &amp; this is &quot;quoted text&quot;");

Matrix custom HTML

For Matrix, its specification defines a custom HTML format that specifies the tags and attributes that can be used. Use matrix_html_to_markup to handle this custom HTML input so that input is sanitized before it is converted.

This function is still work-in-progress!

Simple HTML

By simple HTML, we mean plain text that only contains some formatting tags such as <strong>, <i>, <code>, etc. For the full list of supported tags and how they are replaced, see markup_from_raw.

With sanitization

If you use markup, supported tags are replaced (if necessary), malformed tags are removed and HTML reserved characters are escaped.

let m = markup("<p><strong>this <i>is &sanitized<f;><unsupported/></i></strong></p>");
assert_eq!(m, "<b>this <i>is &amp;sanitized</i></b>");

Other unsupported, but valid tags are escaped.

let m = markup("this is <span>a tag</span>");
assert_eq!(m, "this is &lt;span&gt;a tag&lt;/span&gt;");

URIs are replaced by links.

let m = markup("go to: https://gnome.org");
assert_eq!(m, "go to: <a href=\"https://gnome.org\">https://gnome.org</a>");

Without sanitization

Use markup_from_raw if you have already sanitized input:

let m = markup_from_raw("<p>this is <unsupported>already sanitized</unsupported></p>");
assert_eq!(m, "this is &lt;unsupported&gt;already sanitized&lt;/unsupported&gt;");

To just replace URIs by links, use markup_links:

let m = markup_links("go to: https://gnome.org");
assert_eq!(m, "go to: <a href=\"https://gnome.org\">https://gnome.org</a>");

Dependencies

~5–13MB
~158K SLoC