#domain-name #web #http #parse-url

no-std idna

IDNA (Internationalizing Domain Names in Applications) and Punycode

17 releases (4 stable)

1.0.3 Nov 4, 2024
1.0.2 Jul 1, 2024
1.0.1 Jun 18, 2024
0.5.0 Nov 22, 2023
0.1.0 Mar 30, 2016

#41 in Network programming

Download history 2803275/week @ 2024-08-22 2640760/week @ 2024-08-29 2989470/week @ 2024-09-05 2730970/week @ 2024-09-12 2887155/week @ 2024-09-19 3077828/week @ 2024-09-26 3534888/week @ 2024-10-03 3440267/week @ 2024-10-10 3808506/week @ 2024-10-17 3166576/week @ 2024-10-24 3078961/week @ 2024-10-31 3274289/week @ 2024-11-07 3437888/week @ 2024-11-14 3185146/week @ 2024-11-21 2907118/week @ 2024-11-28 2810337/week @ 2024-12-05

12,965,322 downloads per month
Used in 30,250 crates (83 directly)

MIT/Apache

145KB
2K SLoC

idna

IDNA library for Rust implementing UTS 46: Unicode IDNA Compatibility Processing as parametrized by the WHATWG URL Standard.

What it does

  • An implementation of UTS 46 is provided, with configurable ASCII deny list (e.g. STD3 or WHATWG rules).
  • A callback mechanism is provided for pluggable logic for deciding if a label is deemed potentially too misleading to render as Unicode in a user interface.
  • Errors are marked as U+FFFD REPLACEMENT CHARACTERs in Unicode output so that locations of errors may be illustrated to the user.

What it does not do

  • There is no default/sample policy provided for the callback mechanism mentioned above.
  • Only UTS 46 is implemented: There is no API to request strictly IDNA 2008 only or strictly IDNA 2003 only.
  • There is no API for categorizing errors beyond there being an error.
  • Checks that are configurable in UTS 46 but that the WHATWG URL Standard always set a particular way (regardless of the beStrict flag in the URL Standard) cannot be configured (with the exception of the old deprecated API supporting transitional processing).

Usage

Apps that need to prepare a hostname for usage in protocols are likely to only need the top-level function domain_to_ascii_cow with AsciiDenyList::URL as the second argument. Note that this rejects IPv6 addresses, so before this, you need to check if the first byte of the input is b'[' and, if it is, treat the input as an IPv6 address instead.

Apps that need to display host names to the user should use uts46::Uts46::to_user_interface. The ToUnicode operation is rarely appropriate for direct application usage.

Cargo features

  • alloc - For future proofing. Currently always required. Currently, the crate internal may allocate heap but for typical inputs do not allocate on the heap (apart from the output String when applicable).
  • compiled_data - For future proofing. Currently always required. (Passed through to ICU4X.)
  • std - Adds impl std::error::Error for Errors {} (and implies alloc).
  • By default, all of the above are enabled.

Alternative Unicode back ends

By default, idna uses ICU4X as its Unicode back end. If you wish to opt for different tradeoffs between correctness, run-time performance, binary size, compile time, and MSRV, please see the README of the latest version of the idna_adapter crate for how to opt into a different Unicode back end.

Breaking changes since 0.5.0

  • Stricter IDNA 2008 restrictions are no longer supported. Attempting to enable them panics immediately. UTS 46 allows all the names that IDNA 2008 allows, and when transitional processing is disabled, they resolve the same way. There are additional names that IDNA 2008 disallows but UTS 46 maps to names that IDNA 2008 allows (notably, input is mapped to fold-case output). UTS 46 also allows symbols that were allowed in IDNA 2003 as well as newer symbols that are allowed according to the same principle. (Earlier versions of this crate allowed rejecting such symbols. Rejecting characters that UTS 46 maps to IDNA 2008-permitted characters wasn't supported in earlier versions, either.)
  • domain_to_ascii_strict now performs the CheckHyphens check (matching previous documentation).
  • The ContextJ rules are now implemented and always enabled, even when using the old deprecated API, so input that fails those rules is rejected.
  • The Idna::to_ascii_inner method has been removed. It didn't make sense as a public method, since callers were unable to figure out if there were errors. (A GitHub search found no callers for this method.)
  • Punycode labels whose decoding does not yield any non-ASCII characters are now treated as being in error.
  • When turning off default cargo features, the cargo feature compiled_data needs to be explicitly enabled.

Dependencies

~1.8–2.7MB
~50K SLoC