#web-archive #normalization #archive #surt #web-archiving

bin+lib surt-rs

A Rust implementation of the Sort-friendly URI Reordering Transform (SURT)

4 releases

0.1.3 Jul 2, 2024
0.1.2 Jun 4, 2024
0.1.1 Apr 1, 2024
0.1.0 Mar 23, 2024

#536 in Text processing

Download history 1/week @ 2024-09-10 7/week @ 2024-09-24 7/week @ 2024-10-01

262 downloads per month

MIT license

13KB
245 lines

Rust SURT

This library provides a Rust implementation for generating a Sort-friendly URI Reordering Transform (SURT) from a given URL. These are predominantly used in the Web Archiving world to provide a normalised and sortable variant of a URL for use at replay time.

Usage

use surt_rs::generate_surt;

let url = "http://example.com/path?query=value#fragment";
let surt = generate_surt(url).unwrap();
println!("{}", surt);  // prints: "com,example)/path?query=value#fragment"

Functions

generate_surt(url: &str) -> Result<String, ParseError>

Generates a SURT from the given URL. Returns a Result that contains the SURT as a String if the URL is valid, or a ParseError if the URL is not valid.

normalize_surt(surt: &str) -> String

Normalizes the given SURT by replacing whitespace with '%20' and removing trailing slashes unless it's the root path.

normalize_url(url: &str) -> String

Normalizes the given URL by removing trailing slashes and the 'www.' subdomain after the scheme.

License

This project is licensed under the MIT License.

Dependencies

~4.5–6.5MB
~114K SLoC