#string-interning #interning #string #parser #weak-references #nounsafe

intern-all

A safe and predictable interner for data of mixed and arbitrary type

9 unstable releases (3 breaking)

0.4.1 Feb 23, 2024
0.4.0 Feb 23, 2024
0.3.1 Feb 23, 2024
0.2.4 Feb 22, 2024
0.1.0 Nov 18, 2023

#1777 in Rust patterns

Download history 37/week @ 2024-07-23 22/week @ 2024-07-30 2/week @ 2024-08-27 1/week @ 2024-09-17 12/week @ 2024-09-24

263 downloads per month
Used in orchidlang

MIT license

19KB
336 lines

An interner for data of mixed / arbitrary type. It uses weak references and the default allocator so it can be used in long-running processes.

use std::env;
use std::path::PathBuf;

use intern_all::{i, Tok};

// Intern a value
let a: Tok<String> = i("foo");
// Intern a path
let b: Tok<PathBuf> = i(&env::current_dir().unwrap());

Some convenience methods are also provided to make working with lists easier

use intern_all::{i, ibv, iv, Tok};

// Intern a list as a slice of tokens
let v1: Tok<Vec<Tok<String>>> = i(&[i("bar"), i("quz"), i("quux")][..]);
// Intern a list of internable values
let v2: Tok<Vec<Tok<String>>> =
  iv(["bar".to_string(), "quz".to_string(), "quux".to_string()]);
// Intern a list of the borrowed form of internable values
let v3: Tok<Vec<Tok<String>>> = ibv(["bar", "quz", "quux"]);
assert!(v1 == v2 && v2 == v3)

The interner uses weak references but the unreferenced values still take up space in the token table. To avoid a memory leak, you can periodically sremove entries referring to unreferenced values from the interner with sweep or sweep_t.

use intern_all::{sweep, sweep_t};

// use this for general housekeeping
sweep();
// use this if a lot of temporary values of a particular interned type
// had been dropped recently
sweep_t::<String>();

Dependencies

~3.5MB
~62K SLoC