#string-interning #string #interning #cache #hash-map

interned-string

High-performance and concurrent string interning for general purpose use

3 releases (breaking)

0.3.0 Jul 6, 2024
0.2.0 Jul 4, 2024
0.1.0 Jul 4, 2024

#121 in Caching

MPL-2.0 license

36KB
625 lines

interned-string

This crate exposes IString, an immutable and interned string type.

It's built for high performance and with multithreading in mind.

It provides O(1) Hash and Eq operations, perfect for your HashMap<IString, _>.

Getting Started

You can intern any String or &str value by calling intern().

You can pass an IString by reference to any function that accepts a &str.

use interned_string::Intern;

fn main() {
    let my_istring = "hello".intern();

    foo(&my_istring);
}

fn foo(string: &str) {
    println!("{string}");
}

If you enable the serde feature, you can use IString in place of String in your DTOs.

[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
interned-string = { version = "0.1", features = ["serde"] }
use serde::Deserialize;
use interned_string::IString;

#[derive(Deserialize, Debug)]
struct ExampleDTO {
    favorite_dish: IString
}

fn main() {
    let serialized = "{\"favorite_dish\":\"pasta\"}";

    let example: ExampleDTO = serde_json::from_str(&serialized).unwrap();

    println!("{example:?}")
    // ExampleDTO { favorite_dish: IString("pasta") }
}

Performance Characteristics

Reading an IString's contents is very fast, lock free and wait free (thanks to left_right). The IString can be shared and read from any number of threads. It scales linearly with the number of reading threads.

The tradeoff is that creating a new IString is slower. A radix tree (compact trie) needs to be traversed to deduplicate the new string, a lock needs to be acquired, and the tree needs to be updated if the string wasn't interned yet. While the tree walk can be done in parallel from multiple threads, the lock prevents linear scaling for writes.

Planned Improvements

  • Replace or rewrite the radix tree to make it reuse the string storage, instead of storing a clone of the each interned string. Currently the crate uses 2x the interned string storage space because of this (1x in storage, 1x as a clone in the radix tree).

  • Free the memory of unused strings in the background / in a cold path. Currenly the memory is only free'd the next time a new string is inserted in storage, which may be unexpected.

Contributing

Feel free to open a PR. Any contribution is greatly appreciated.

If you have a suggestion, please open an Issue.

License

Mozilla Public License 2.0

Acknowledgments

Special thanks to Jon Gjengset (@jonhoo) for providing the inspiration for this crate, and for his work on left-right.

Dependencies

~0.8–27MB
~344K SLoC