3 releases (breaking)
0.3.0 | Dec 30, 2024 |
---|---|
0.2.0 | Dec 25, 2024 |
0.1.0 | Dec 19, 2024 |
#254 in Data structures
449 downloads per month
495KB
13K
SLoC
Frozen Collections - Fast Partially Immutable Collections
- Summary
- Handling Compile-Time Data
- Using in a Build Script
- Handling Runtime Data
- Traits
- Performance Considerations
- Analysis and Optimizations
- Cargo Features
Summary
Frozen collections are designed to deliver improved
read performance relative to the standard HashMap
and
HashSet
types. They are ideal for use with long-lasting collections
which get initialized when an application starts and remain unchanged
permanently, or at least extended periods of time. This is a common
pattern in service applications.
As part of creating a frozen collection, analysis is performed over the data that the collection will hold to determine the best layout and algorithm to use to deliver optimal performance. Depending on the situation, sometimes the analysis is done at compile-time whereas in other cases it is done at runtime when the collection is initialized. This analysis can take some time, but the value in spending this time up front is that the collections provide faster read-time performance.
Frozen maps are only partially immutable. The keys associated with a frozen map are determined at creation time and cannot change, but the values can be updated at will if you have a mutable reference to the map. Frozen sets however are completely immutable and so never change after creation.
See BENCHMARKS.md for current benchmark numbers.
Handling Compile-Time Data
If you know the keys and values that will be in your collection at compile time, you can use
one of eight macros to create frozen collections:
fz_hash_map!
,
fz_ordered_map!
,
fz_scalar_map!
,
fz_string_map!
,
fz_hash_set!
,
fz_ordered_set!
,
fz_scalar_set!
, or
fz_string_set!
.
These macros analyze the data you provide
and return a custom implementation type that's optimized for the data. All the
possible types implement the
Map
or
Set
traits.
The macros exist in a short form and a long form, described below.
Short Form
With the short form, you supply the data that goes into the collection and get in return an initialized collection of an unnamed type. For example:
use frozen_collections::*;
let m = fz_string_map!({
"Alice": 1,
"Bob": 2,
"Sandy": 3,
"Tom": 4,
});
At build time, the macro analyzes the data supplied and determines the best map
implementation type to use. As such, the type of m
is not known to this code. m
will
always implement the Map
trait however, so you can leverage type inference even though
you don't know the actual type of m
:
use frozen_collections::*;
fn main() {
let m = fz_string_map!({
"Alice": 1,
"Bob": 2,
"Sandy": 3,
"Tom": 4,
});
more(m);
}
fn more<M>(m: M)
where
M: Map<&'static str, i32>
{
assert!(m.contains_key(&"Alice"));
}
Long Form
The long form lets you provide a type alias name which will be created to correspond to the collection implementation type chosen by the macro invocation. Note that you must use the long form if you want to declare a static frozen collection.
use frozen_collections::*;
fz_string_map!(static MAP: MyMapType<&'static str, i32>, {
"Alice": 1,
"Bob": 2,
"Sandy": 3,
"Tom": 4,
});
The above creates a static variable called MAP
with keys that are strings and values which are
integers. As before, you don't know the specific implementation type selected by the macro, but
this time you have a type alias (i.e. MyMapType
) representing that type. You can then use this alias
anywhere you'd like to in your code where you'd like to mention the type explicitly.
To use the long form for non-static uses, replace static
with let
:
use frozen_collections::*;
fz_string_map!(let m: MyMapType<&'static str, i32>, {
"Alice": 1,
"Bob": 2,
"Sandy": 3,
"Tom": 4,
});
more(m);
struct S {
m: MyMapType,
}
fn more(m: MyMapType) {
assert!(m.contains_key("Alice"));
}
Using in a Build Script
You can use the
CollectionEmitter
,
struct to initialize a frozen collections from a build
script and output the results in a file that then gets compiled into your application. Due
to the fact build scripts run in a richer environment than procedural macros, the resulting
efficiency of collections generated from build scripts can be slightly faster than the ones
generated with the macros.
Handling Runtime Data
If you don't know the exact keys and values that will be in your collection at compile time,
you use the dedicated map and collection types to hold your data:
FzHashMap
,
FzOrderedMap
,
FzScalarMap
,
FzStringMap
,
FzHashSet
,
FzOrderedSet
,
FzScalarSet
, or
FzStringSet
.
These
types analyze the data you provide at runtime and determine the best strategy to handle your
data dynamically.
use frozen_collections::*;
let v = vec![
("Alice", 1),
("Bob", 2),
("Sandy", 3),
("Tom", 4),
];
let m = FzStringMap::new(v);
Note that in general, if possible, it's more efficient to use the macros to create your frozen collection instances.
Traits
The maps produced by this crate implement the following traits:
Map
. The primary representation of a map. This trait hasMapQuery
andMapIteration
as super-traits.MapQuery
. A trait for querying maps. This is an object-safe trait.MapIteration
. A trait for iterating over maps.
The sets produced by this crate implement the following traits:
Set
. The primary representation of a set. This trait hasSetQuery
,SetIteration
andSetOps
as super-traits.SetQuery
. A trait for querying sets. This is an object-safe trait.SetIteration
. A trait for iterating over sets.SetOps
. A trait for set operations like union and intersections.
Performance Considerations
The analysis performed when creating maps tries to find the best concrete implementation type given the data at hand. The macros perform analysis at build time and generally produce slightly faster results. The collection types meanwhile perform analysis at runtime and the resulting collections are slightly slower.
When creating static collections using the macros, the collections produced can often be embedded directly as constant data into the binary of the application, thus requiring no initialization time and no heap space. at This also happens to be the fastest form for these collections. When possible, this happens automatically, you don't need to do anything special to enable this behavior.
Analysis and Optimizations
Unlike normal collections, the frozen collections require you to provide all the data for the collection when you create the collection. The data you supply is analyzed which determines which specific underlying implementation strategy to use and how to lay out data internally.
The available implementation strategies are:
-
Scalar as Hash. When the keys are of an integer or enum type, this uses the keys themselves as hash codes, avoiding the overhead of hashing.
-
Length as Hash. When the keys are of a string type, the length of the keys are used as hash code, avoiding the overhead of hashing.
-
Dense Scalar Lookup. When the keys represent a contiguous range of integer or enum values, lookups use a simple array access instead of hashing.
-
Sparse Scalar Lookup. When the keys represent a sparse range of integer or enum values, lookups use a sparse array access instead of hashing.
-
Left Hand Substring Hashing. When the keys are of a string type, this uses sub-slices of the keys for hashing, reducing the overhead of hashing.
-
Right Hand Substring Hashing. Similar to the Left Hand Substring Hashing from above, but using right-aligned sub-slices instead.
-
Linear Scan. For very small collections, this avoids hashing completely by scanning through the entries in linear order.
-
Ordered Scan. For very small collections where the keys implement the
Ord
trait, this avoids hashing completely by scanning through the entries in linear order. Unlike the Linear Scan strategy, this one can early exit when keys are not found during the scan. -
Classic Hashing. This is the fallback when none of the previous strategies apply. The frozen implementations are generally faster than
HashMap
andHashSet
. -
Binary Search. For relatively small collections where the keys implement the
Ord
trait, classic binary searching is used. -
Eytzinger Search. For larger collections where the keys implement the
Ord
trait, a cache-friendly Eytzinger search is used.
Cargo Features
You can specify the following features when you include the frozen_collections
crate in your
Cargo.toml
file:
macros
. Enables the macros for creating frozen collections.emit
. Enables theCollectionEmitter
struct that lets you create frozen collections from a build script.serde
. Enables serialization and deserialization support for the frozen collections.std
. Enables small features only available when building with the standard library.
All features are enabled by default.
Dependencies
~1.3–2MB
~29K SLoC