#archive #normalization #surt #web-archiving

bin+lib surt-rs

A Rust implementation of the Sort-friendly URI Reordering Transform (SURT)

2 releases

0.1.1 Apr 1, 2024
0.1.0 Mar 23, 2024

#448 in Text processing

Download history 132/week @ 2024-03-19 81/week @ 2024-03-26 85/week @ 2024-04-02

298 downloads per month

MIT license

13KB
245 lines

Rust SURT

This library provides a Rust implementation for generating a Sort-friendly URI Reordering Transform (SURT) from a given URL. These are predominantly used in the Web Archiving world to provide a normalised and sortable variant of a URL for use at replay time.

Usage

use surt_rs::generate_surt;

let url = "http://example.com/path?query=value#fragment";
let surt = generate_surt(url).unwrap();
println!("{}", surt);  // prints: "com,example)/path?query=value#fragment"

Functions

generate_surt(url: &str) -> Result<String, ParseError>

Generates a SURT from the given URL. Returns a Result that contains the SURT as a String if the URL is valid, or a ParseError if the URL is not valid.

normalize_surt(surt: &str) -> String

Normalizes the given SURT by replacing whitespace with '%20' and removing trailing slashes unless it's the root path.

normalize_url(url: &str) -> String

Normalizes the given URL by removing trailing slashes and the 'www.' subdomain after the scheme.

License

This project is licensed under the MIT License.

Dependencies

~3.5–5MB
~113K SLoC