1 unstable release
0.1.0 | Nov 6, 2020 |
---|
#8 in #sanitize
11KB
97 lines
WordPress's Slugify Function, in Rust.
Slugs are the parts of a URL generated from a hand-written or human-friendly title. In order to be both readable and search-engine optimized, slugs are generally stripped of all frivolous punctuation and, since they're meant to be used in URLs, all dangerous HTML and HTML entities are removed.
This algorithm closely matches the one used in Wordpress's
formatting.php
file, sanitize_title_with_dashes()
, although by
leveraging Rust's more powerful Regex library and Rust's native Unicode
features, I was able to get it done in a slightly smaller space. It's
not spectacularly efficient, but it works.
In the library you'll find a series of tests that show off what it does.
Reasons
Mostly, I needed a slugification function and the ones I found on
crates.io didn't thrill me. URLs and Databases are UTF-8 aware these
days, and the most popular ones either use
deunicode
or do other
sorts of mangling.
There are two functions in the library: one does the sanitization and returns a Vec of the words after sanitization; the other uses hyphens to join them together into a slug. I needed the Vec available as I'm using this library to create a trie of titles in a document store to support autosuggest and autoreferencing features.
And the repetition of sanitizing stages gave me an excuse to reboot some
of my macro_rules!
knowledge, since I hadn't used them much recently
and was starting to need them for my other project.
LICENSE
- WordPress™ is a trademark of Automattic, Inc.
This slugification library is Copyright Elf
M. Sternberg (c) 2019, and licensed with the
Mozilla Public License vers. 2.0. A copy of the license
file is included in the docs/
folder.
Dependencies
~2.2–3MB
~55K SLoC