4 releases (2 stable)
| 1.1.0 | Oct 5, 2025 |
|---|---|
| 1.0.0 | Sep 23, 2025 |
| 0.2.0 | Sep 21, 2025 |
| 0.1.0 | Sep 20, 2025 |
#422 in Database interfaces
28 downloads per month
165KB
2.5K
SLoC
SXURL: Slice eXact URL Identifier ("sixerl")
SXURL (pronounced "Sixerl") is a fixed-length, sliceable URL identifier system designed for efficient database storage and querying. It converts URLs into deterministic 256-bit identifiers where each URL component occupies a fixed position, enabling fast substring-based filtering and indexing.
Features
- ๐ Fixed-length: All SXURL identifiers are exactly 256 bits (64 hex characters)
- ๐ Sliceable: Each URL component has a fixed position for substring filtering
- ๐ Deterministic: Same input always produces the same output
- ๐ก๏ธ Collision-resistant: Uses SHA-256 hashing for component fingerprinting
- ๐ Standards-compliant: Supports IDNA, Public Suffix List, and standard URL schemes
- โก Zero-copy: Efficient parsing and encoding with minimal allocations
- ๐งช Thoroughly tested: 100+ comprehensive tests covering edge cases
Quick Start
Add this to your Cargo.toml:
[dependencies]
sxurl = "0.1"
Basic Usage
use sxurl::{encode_url_to_hex, decode_hex, matches_component};
// Encode a URL to SXURL
let sxurl_hex = encode_url_to_hex("https://docs.rs/serde")?;
println!("SXURL: {}", sxurl_hex); // 64 hex characters
// Decode and inspect components
let decoded = decode_hex(&sxurl_hex)?;
println!("Scheme: {}", decoded.header.scheme);
println!("Has subdomain: {}", decoded.header.subdomain_present);
// Fast component matching for filtering
let is_docs_rs = matches_component(&sxurl_hex, "domain", "docs")?;
assert!(is_docs_rs);
let is_rs_tld = matches_component(&sxurl_hex, "tld", "rs")?;
assert!(is_rs_tld);
URL Parsing Utilities
use sxurl::{split_url, parse_query, get_anchor, strip_anchor, join_url_path};
// Parse URL into all components at once
let parts = split_url("https://api.github.com/repos?page=1#readme")?;
println!("Domain: {}", parts.domain); // "github"
println!("Subdomain: {:?}", parts.subdomain); // Some("api")
println!("Anchor: {:?}", parts.anchor); // Some("readme")
// Work with query parameters
let params = parse_query("https://example.com?foo=bar&page=2")?;
println!("Page: {:?}", params.get("page")); // Some("2")
// Handle anchors (fragments)
let anchor = get_anchor("https://docs.rs#search")?; // Some("search")
let clean_url = strip_anchor("https://example.com#top")?; // "https://example.com"
// Join URLs properly
let api_url = join_url_path("https://api.example.com/v1", "users")?;
// Result: "https://api.example.com/v1/users"
Database Integration Example
use sxurl::encode_url_to_hex;
// Store URLs efficiently in your database
let urls = vec![
"https://docs.rs/serde",
"https://crates.io/crates/tokio",
"https://github.com/rust-lang/rust",
];
for url in urls {
let sxurl = encode_url_to_hex(url)?;
// Store sxurl as CHAR(64) or BINARY(32) in your database
println!("INSERT INTO urls (sxurl, original_url) VALUES ('{}', '{}')", sxurl, url);
}
// Query by domain efficiently using substring matching
// SELECT * FROM urls WHERE sxurl LIKE '_______________example_______________'
// ^domain slice position (chars 7-22)
SXURL Format
The 256-bit SXURL has this fixed layout:
| Component | Hex Range | Bits | Description |
|---|---|---|---|
| header | [0..3) | 12 | Version, scheme, flags |
| tld_hash | [3..7) | 16 | Top-level domain hash |
| domain | [7..22) | 60 | Domain name hash |
| subdomain | [22..30) | 32 | Subdomain hash |
| port | [30..34) | 16 | Port number |
| path | [34..49) | 60 | Path hash |
| params | [49..58) | 36 | Query parameters hash |
| fragment | [58..64) | 24 | Fragment hash |
Header Format (12 bits)
- Version (4 bits): Currently always
1 - Scheme (3 bits):
0=https,1=http,2=ftp - Flags (5 bits): Component presence indicators
- Bit 4: Subdomain present
- Bit 3: Query parameters present
- Bit 2: Fragment present
- Bit 1: Non-default port present
- Bit 0: Reserved (always 0)
Advanced Usage
Custom Encoder
use sxurl::SxurlEncoder;
let encoder = SxurlEncoder::new();
// Encode to bytes
let sxurl_bytes = encoder.encode("https://example.com")?;
assert_eq!(sxurl_bytes.len(), 32); // Always 32 bytes
// Encode to hex string
let sxurl_hex = encoder.encode_to_hex("https://example.com")?;
assert_eq!(sxurl_hex.len(), 64); // Always 64 hex chars
Component Filtering
use sxurl::{encode_url_to_hex, matches_component};
let urls = vec![
"https://api.github.com/repos/rust-lang/rust",
"https://docs.github.com/en/rest",
"https://github.blog/2023-01-01/announcement",
];
// Find all GitHub URLs
for url in &urls {
let sxurl = encode_url_to_hex(url)?;
if matches_component(&sxurl, "domain", "github")? {
println!("GitHub URL: {}", url);
}
}
// Find all API endpoints
for url in &urls {
let sxurl = encode_url_to_hex(url)?;
if matches_component(&sxurl, "subdomain", "api")? {
println!("API URL: {}", url);
}
}
Hash Function Access
use sxurl::ComponentHasher;
// Access individual hash functions
let tld_hash = ComponentHasher::hash_tld("com")?;
let domain_hash = ComponentHasher::hash_domain("example")?;
let path_hash = ComponentHasher::hash_path("/api/v1/users")?;
println!("TLD hash: 0x{:04x}", tld_hash);
println!("Domain hash: 0x{:015x}", domain_hash);
println!("Path hash: 0x{:015x}", path_hash);
Supported URL Schemes
https(scheme code 0) - Default port 443http(scheme code 1) - Default port 80ftp(scheme code 2) - Default port 21
Use Cases
Database Indexing
Store URLs as fixed-length identifiers with efficient B-tree indexing:
CREATE TABLE url_index (
sxurl CHAR(64) PRIMARY KEY,
original_url TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Index by domain (characters 7-22)
CREATE INDEX idx_domain ON url_index (SUBSTRING(sxurl, 8, 15));
-- Index by TLD (characters 3-7)
CREATE INDEX idx_tld ON url_index (SUBSTRING(sxurl, 4, 4));
URL Deduplication
Quickly identify duplicate URLs across different formats:
use sxurl::encode_url_to_hex;
use std::collections::HashSet;
let mut seen_urls = HashSet::new();
let urls = vec![
"https://Example.Com/Path",
"https://example.com/path", // Same after normalization
"https://example.com/path/", // Different path
];
for url in urls {
let sxurl = encode_url_to_hex(url)?;
if seen_urls.insert(sxurl) {
println!("New URL: {}", url);
} else {
println!("Duplicate URL: {}", url);
}
}
Fast Domain Filtering
Filter large URL datasets by domain without parsing:
// Filter millions of URLs by domain using simple string operations
let domain_filter = "1a2b3c4d5e6f7890abc"; // Hash of target domain
let urls_in_domain: Vec<_> = all_sxurls
.iter()
.filter(|sxurl| &sxurl[7..22] == domain_filter)
.collect();
Error Handling
All functions return Result<T, SxurlError>. Common error cases:
use sxurl::{encode_url_to_hex, SxurlError};
match encode_url_to_hex("invalid-url") {
Ok(sxurl) => println!("Success: {}", sxurl),
Err(SxurlError::InvalidScheme) => println!("Unsupported URL scheme"),
Err(SxurlError::HostNotDns) => println!("Host must be a DNS name, not IP"),
Err(SxurlError::InvalidLength) => println!("Invalid SXURL format"),
Err(e) => println!("Other error: {}", e),
}
Performance
SXURL is designed for high-performance applications:
- Encoding: ~1-5 ฮผs per URL (depending on complexity)
- Decoding: ~100-500 ns per SXURL
- Component matching: ~50-100 ns per comparison
- Memory usage: Fixed 32 bytes per SXURL, minimal temporary allocations
Benchmarks on a modern CPU:
encode_simple_url time: 1.2 ฮผs
encode_complex_url time: 4.8 ฮผs
decode_sxurl time: 245 ns
component_match time: 67 ns
Technical Details
Hash Function
SXURL uses labeled SHA-256 hashing: H_n(label, data) = lower_n(SHA256(label || 0x00 || data))
- Collision resistance: ~2^(n/2) where n is the bit width
- Domain separation: Different labels produce different hashes for same data
- Deterministic: Same input always produces same output
URL Normalization
- Scheme and host converted to lowercase
- IDNA (Internationalized Domain Names) support
- Public Suffix List (PSL) for proper domain/TLD splitting
- IP addresses rejected (DNS names only)
Component Handling
- Empty components stored as zero (not hashed)
- Default ports (80, 443, 21) stored explicitly
- Query parameters and fragments preserved as-is
- Path normalization preserves original structure
Testing
Run the comprehensive test suite:
cargo test # Run all tests
cargo test --doc # Test documentation examples
cargo test --release # Test optimized builds
cargo bench # Run performance benchmarks
The library includes 100+ tests covering:
- Specification compliance
- Edge cases and error conditions
- Round-trip consistency
- Hash collision resistance
- Performance regression testing
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Specification
SXURL follows a formal specification available in SXURL-SPEC.md. The implementation is designed to be compatible with other SXURL implementations in different languages.
Changelog
See CHANGELOG.md for detailed release history.
License
Licensed under the Apache License, Version 2.0 (LICENSE or http://www.apache.org/licenses/LICENSE-2.0).
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be licensed under the Apache License, Version 2.0, without any additional terms or conditions.
SXURL - Efficient, deterministic URL identifiers for modern applications.
Dependencies
~7MB
~182K SLoC