#parse-url #url #parser

mdurl

URL parser and formatter that gracefully handles invalid input

5 unstable releases

0.3.1 Sep 17, 2022
0.3.0 Sep 16, 2022
0.2.0 Sep 13, 2022
0.1.1 Sep 10, 2022
0.1.0 Sep 10, 2022

#2143 in Parser implementations

Download history 409/week @ 2023-11-26 315/week @ 2023-12-03 460/week @ 2023-12-10 452/week @ 2023-12-17 320/week @ 2023-12-24 380/week @ 2023-12-31 427/week @ 2024-01-07 339/week @ 2024-01-14 389/week @ 2024-01-21 266/week @ 2024-01-28 302/week @ 2024-02-04 366/week @ 2024-02-11 293/week @ 2024-02-18 394/week @ 2024-02-25 382/week @ 2024-03-03 152/week @ 2024-03-10

1,257 downloads per month
Used in 21 crates (via markdown-it)

MIT license

55KB
894 lines

mdurl

web demo github docs.rs crates.io coverage

URL parser and formatter that gracefully handles invalid input.

It is a rust port of mdurl.js library, created specifically for url rendering in markdown-it parser.

URL formatter

This function takes URL, decodes it, and fits it into N characters, replacing the rest with "…" symbol (that's called "url elision").

This is similar to what Chromium would show you in status bar when you hover your mouse over a link.

use mdurl::format_url_for_humans as format;
let url = "https://www.reddit.com/r/programming/comments/vxttiq/\
comment/ifyqsqt/?utm_source=reddit&utm_medium=web2x&context=3";

assert_eq!(format(url, 20), "reddit.com/…/ifyqsq…");
assert_eq!(format(url, 30), "www.reddit.com/r/…/ifyqsqt/?u…");
assert_eq!(format(url, 50), "www.reddit.com/r/programming/comments/…/ifyqsqt/?…");

Check out this demo to play around with different URLs and lengths.

humanize-url crate tries to achieve similar goals, let me know if there are others.

URL parser

In order to achieve the task above, a new url parser had to be created, so here it is:

let url = "https://www.reddit.com/r/programming/comments/vxttiq/\
comment/ifyqsqt/?utm_source=reddit&utm_medium=web2x&context=3";
let u = mdurl::parse_url(url);

assert_eq!(u.hostname, Some("www.reddit.com".into()));
assert_eq!(u.pathname, Some("/r/programming/comments/vxttiq/comment/ifyqsqt/".into()));
assert_eq!(u.search, Some("?utm_source=reddit&utm_medium=web2x&context=3".into()));

This function uses a non-standard parsing algorithm derived from node.js legacy URL parser.

You should probably be using rust-url crate instead. Unfortunately, it isn't suitable for the task of pretty-printing urls because you can't customize parts of Url returned by that library (for example, rust-url will always encode non-ascii hostname with punycode, this implementation will not).

Dependencies

~3.5–5MB
~113K SLoC