3 releases (stable)

1.1.0 Dec 30, 2021
1.0.0 Apr 28, 2021
0.0.0 Apr 28, 2021

#1091 in Text processing

39 downloads per month


1.5K SLoC


Implementation of the WTF-8 encoding.

WTF-8 is a hack intended to be used internally in self-contained systems with components that need to support potentially ill-formed UTF-16 for legacy reasons.

Any WTF-8 data must be converted to a Unicode encoding at the system’s boundary before being emitted. UTF-8 is recommended. WTF-8 must not be used to represent text in a file format or for transmission over the Internet.

In particular, the Encoding Standard [ENCODING] defines UTF-8 and other encodings for the Web. There is no and will not be any encoding label [ENCODING] or IANA charset alias [CHARSETS] for WTF-8.



Depends on the standard library’s alloc crate but not std.

  • Wtf8 and Wtf8Buf - Similar to str and String, provides type-safe WTF-8 strings.
  • CodePoint - Similar to char, provides type-safe Unicode code points.
  • Lossless conversion from potentially ill-formed UTF-16 to CodePoint iterator and from CodePoint iterators to Wtf8Buf, and from str to Wtf8.
  • Conversion from Wtf8 to String, potentially lossy.


Licensed under either of

at your option.


Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

No runtime deps