#string #umbra-string #german-string

strumbra

An implementation for Umbra-style strings (also known as German strings)

11 releases (4 breaking)

0.5.2 Oct 18, 2024
0.5.1 Sep 27, 2024
0.4.1 Sep 13, 2024
0.3.2 Sep 4, 2024
0.1.0 Aug 28, 2024

#797 in Data structures

Download history 121/week @ 2024-10-01 169/week @ 2024-10-08 322/week @ 2024-10-15 150/week @ 2024-10-22 136/week @ 2024-10-29 28/week @ 2024-11-05 90/week @ 2024-11-12 85/week @ 2024-11-19 106/week @ 2024-11-26 110/week @ 2024-12-03 98/week @ 2024-12-10 26/week @ 2024-12-17 2/week @ 2024-12-31 90/week @ 2025-01-07 42/week @ 2025-01-14

134 downloads per month

MIT license

47KB
980 lines

Strumbra

An implementation for the string data structure as described in Umbra: A Disk-Based System with In-Memory Performance.

3 different types are implemented:

  • BoxString behaves like a Box<str>.
  • ArcString behaves like a Arc<str>.
  • RcString behaves like a Rc<str>.

Additionally, we define the following type aliases:

  • UniqueString = BoxString<4>
  • SharedString = ArcString<4>

Properties

  • Strings are immutable.
  • Strings can only have a maximum length of u32::MAX.
  • Strings whose length is less than or equal to 12 are stack-allocated.
  • Comparing and ordering is relatively fast and cache-friendly for most strings.

Benchmarks

Very simple micro-benchmarks were conducted to compare the performance of ordering strings of the different types - String, UniqueString, and SharedString. We see no difference between UniqueString and SharedString as expected since they share the exact comparison implementation. When comparing 2 different random strings, performance is much better with UniqueString and SharedString because most comparisons only use the first few bytes.

Comparing random strings

On the other hand, comparing 2 identical strings yields better results using UniqueString and SharedString only when the strings have 4 bytes, so only the prefixes are compared. Otherwise, due to the conditional branches, UniqueString and SharedString perform similarly to String when the strings can still be inlined and worse when the strings can't.

Comparing identical strings

Dependencies

~170KB