#string-interning #string-cache #string #interning #symbol-table #symbol #intern

arc-string-interner

An efficient cuncurrent string interning data structure with minimal memory-footprint and fast access to the underlying contents

3 unstable releases

0.3.0-alpha2 Nov 8, 2021
0.3.0-alpha1 Oct 11, 2021
0.1.1 Jun 16, 2021
0.1.0 Apr 1, 2020

#1082 in Data structures

MIT/Apache

50KB
803 lines

String Interner

Linux Windows Codecov Coveralls Docs Crates.io
travisCI appveyor codecov coveralls docs crates

A data structure to cache strings efficiently, with minimal memory footprint and the ability to assicate the interned strings with unique symbols. These symbols allow for constant time comparisons and look-ups to the underlying interned string contents. Also, iterating through the interned strings is cache efficient.

Internals

  • Internally a hashmap M and a vector V is used.
  • V stores the contents of interned strings while M has internal references into the string of V to avoid duplicates.
  • V stores the strings with an indirection to avoid iterator invalidation.
  • Returned symbols usually have a low memory footprint and are efficiently comparable.

Planned Features

  • Safe abstraction wrapper that protects the user from the following misusages:
    • Using symbols of a different string interner instance to resolve string in another.
    • Using symbols that are already no longer valid (i.e. the associated string interner is no longer available).

License

Licensed under either of

at your option.

Dual licence: badge badge

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Changelog

  • 0.7.1

    • CRITICAL fix use after free bug in StringInterner::clone()
    • implement std::iter::Extend for StringInterner
    • Sym::from_usize now avoids using unsafe code
    • optimize FromIterator impl of StringInterner
    • move to Rust 2018 edition

    Thanks YOSHIOKA Takuma for implementing this release.

  • 0.7.0

    • changed license from MIT to MIT/APACHE2.0
    • removed generic impl of Symbol for types that are From<usize> and Into<usize>
    • removed StringInterner::clear API since its usage breaks invariants
    • added StringInterner::{capacity, reserve} APIs
    • introduced a new default symbol type Sym that is a thin wrapper around NonZeroU32 (idea by koute)
    • made DefaultStringInterner a type alias for the new StringInterner<Sym>
    • added convenient FromIterator impl to StringInterner<S: Sym>
    • dev
      • rewrote all unit tests (serde tests are still missing)
      • entirely refactored benchmark framework
      • added html_root_url to crate root

    Thanks matklad for suggestions and impulses

  • 0.6.3

    • fixed a bug that StringInterner's Send impl didn't respect its generic HashBuilder parameter. Fixes GitHub issue #4.
  • 0.6.2

    • added shrink_to_fit public method to StringInterner - (by artemshein)
  • 0.6.1

    • fixed a bug that inserting non-owning string types (e.g. str) was broken due to dangling pointers (Thanks to artemshein for fixing it!)
  • 0.6.0

    • added optional serde serialization and deserialization support
    • more efficient and generic PartialEq implementation for StringInterner
    • made StringInterner generic over BuildHasher to allow for custom hashers
  • 0.5.0

    • added IntoIterator trait implementation for StringInterner
    • greatly simplified iterator code
  • 0.4.0

    • removed restrictive constraint for Unsigned for Symbol
  • 0.3.3

    • added Send and Sync to InternalStrRef to make StringInterner itself Send and Sync

Dependencies

~0.5–1MB
~17K SLoC