10 releases (5 breaking)
0.6.0 | Oct 2, 2023 |
---|---|
0.5.0 | Sep 23, 2022 |
0.4.1 | Jan 4, 2021 |
0.4.0 | Dec 2, 2020 |
0.1.0 | Jun 1, 2018 |
#7 in #html-string
493 downloads per month
66KB
1K
SLoC
HTML2Pango
Small rust lib to parse simple html tags and convert to pango. This lib also converts raw links to http links and sanitizes the message to avoid not wanted tags.
The current state of the library is Alpha as it's a proof of concept for mapping a subset of html to pango markup.
This code was inside Fractal project to parse matrix.org messages that contains links and make that drawable in a GtkLabel. We decided to move to this repo to be able to extend and use in other projects like Hammond.
lib.rs
:
Library for sanitizing and converting HTML strings to something that Pango can render.
This library contains several functions to (pre)process text to Pango Markup. What to use and when depends on the type of input and the desired result. This can range from just escaping to converting and sanitizing. See the examples below for what is available based on the input type.
The functions below convert strings to strings. If your input can contain several block
elements such as headings, lists, code or quote blocks, see the block
module to convert an
input string into a list of these blocks.
Markdown/body HTML
To handle more HTML, use the markup_html
function. This function supports HTML body markup
such as HTML resulting from a Markdown-to-HTML conversion. It tries to convert the input to
Pango Markup such that rendering by Pango will make it similar like what a browser would.
This involves adding newlines for paragraphs and lists, converting font styles, etc.
let m = markup_html("<body>this is some <font color=\"#ff0000\">red text</font>!</body>").unwrap();
assert_eq!(m, "this is some <span foreground=\"#ff0000\">red text</span>!");
let m = markup_html("<body>a nice <a href=\"https://gnome.org\">link</a>").unwrap();
assert_eq!(m, "a nice <a href=\"https://gnome.org\">link</a>");
let m = markup_html("<body>some items: <ul><li>first</li><li>second</li></ul><body").unwrap();
assert_eq!(m, "some items: \n • first\n • second\n");
Escaping
To just escape any HTML reserved characters, use html_escape
:
let s = html_escape("this is a <tag> & this is \"quoted text\"");
assert_eq!(s, "this is a <tag> & this is "quoted text"");
Matrix custom HTML
For Matrix, its specification defines a custom HTML format that
specifies the tags and attributes that can be used. Use matrix_html_to_markup
to handle
this custom HTML input so that input is sanitized before it is converted.
This function is still work-in-progress!
Simple HTML
By simple HTML, we mean plain text that only contains some formatting tags such as
<strong>
, <i>
, <code>
, etc.
For the full list of supported tags and how they are replaced, see markup_from_raw
.
With sanitization
If you use markup
, supported tags are replaced (if necessary), malformed tags are removed
and HTML reserved characters are escaped.
let m = markup("<p><strong>this <i>is &sanitized<f;><unsupported/></i></strong></p>");
assert_eq!(m, "<b>this <i>is &sanitized</i></b>");
Other unsupported, but valid tags are escaped.
let m = markup("this is <span>a tag</span>");
assert_eq!(m, "this is <span>a tag</span>");
URIs are replaced by links.
let m = markup("go to: https://gnome.org");
assert_eq!(m, "go to: <a href=\"https://gnome.org\">https://gnome.org</a>");
Without sanitization
Use markup_from_raw
if you have already sanitized input:
let m = markup_from_raw("<p>this is <unsupported>already sanitized</unsupported></p>");
assert_eq!(m, "this is <unsupported>already sanitized</unsupported>");
Links
To just replace URIs by links, use markup_links
:
let m = markup_links("go to: https://gnome.org");
assert_eq!(m, "go to: <a href=\"https://gnome.org\">https://gnome.org</a>");
Dependencies
~7–14MB
~172K SLoC