#string #byte-string #comparison #immutability #memory #optimized #german

german-str

small-string optimized string type with fast comparisons

1 unstable release

0.1.0 Aug 20, 2024

#1918 in Algorithms

MIT license

34KB
657 lines

german-str

German strings are a string type with the follow properties:

  • They are immutable.
  • size_of::<GermanStr>() == 16
  • They can't be longer than 2^32 bytes.
  • Strings of 12 or less bytes are entirely located on the stack.
  • Comparisons depending only on the first 4 bytes are very fast.

They are described here. TL;DR: it's a 16 bytes struct where:

  • The first 4 bytes of the struct is an u32 representing the length of the string.
  • The first 4 bytes of the string are stored right after.
  • If the rest of the string can fit in the remaining 8 bytes, it is directly stored there.
  • Otherwise the last 8 bytes are a pointer to the string buffer on the heap (which includes the 4 bytes prefix).

The implementation was heavily inspired by SmolStr.

The main downside of GermanStr compared to SmolStr is that heap buffers aren't shared between instances by default: this is enabled by calling leaky_shared_clone, which clones in O(1) time, but introduces the risks associated with manual memory management.

Requirements

  • [cfg(target_pointer_width = "64")]
  • The crate is compatible with [no_std].

Benchmarks

The following plots are generated by the crate's benchmarks. In the first half of rows, comparisons are made on random ASCII strings. As a result, the vast majority of comparisons only require comparing prefixes. In the second half (worst cases), the string compared are identical, and every pair of byte has to be compared. Unless the string is short enough to be inlined, performance is equivalent to comparing two regular String.

benches

Dependencies

~210KB