#arc #data-structure #sync

pour

Optionally consed radix tries for fast set operations

5 releases

0.2.1 Nov 14, 2020
0.2.0 Nov 13, 2020
0.1.3 Nov 13, 2020
0.1.0 Sep 29, 2020

#876 in Concurrency

MIT/Apache

130KB
2.5K SLoC

pour

crates.io Downloads Documentation Pipeline status codecov License: MIT

pour is an implementation of an immutable IdMap: it maps bitvector IDs to values using a radix trie.

Since these IdMaps are immutable, they can share a lot of data between them, and hash-consing can be used to increase the degree of sharing between IdMaps. More interestingly, this data structure is designed to support asymptotically fast set operations on hash-consed IdMaps, including:

  • Unions, intersections, (symmetric) differences, and complements
  • Subset/superset checks

The best part is, the more memory shared, the faster these operations become in the general case (though the specialized ptr variants of these operations may return incorrect values on non hash-consed, i.e. maximally shared, inputs!) To allow user customized hash-consing strategies, the internal Arcs behind this data structure can be exposed as opaque objects which the user may manipulate using the ConsCtx trait. Alternatively, () implements SetCtx with no consing, and there are helpers to perform set operations without consing.

There are also some nice implementation details (which may change), including:

  • IdMap<K, V> and hence IdSet<K> are the size of a pointer.
  • NonEmptyIdMap<K, V> and hence NonEmptyIdSet<K> are the size of a pointer and null-pointer optimized, i.e. Option<NonEmptySet<T>> is also the size of a pointer.

Right now, the feature-set is deliberately kept somewhat minimal, as pour has a particular use case (the rain intermediate representation). But if I have time and/or anyone wants to contribute, all kinds of things can be added! Examples of potential future features include

  • Map not just from integer keys but from integer ranges, with similar efficiency
  • Union maps of different types

pour is currently implemented without any unsafe, though that may change. We do, however use the non-standard elysees Arc implementation (a fork of Servo's triomphe by the author) to avoid weak reference counts.

NOTE: "levels" are currently not yet supported! Returning a level number greater than 0 will cause a panic!

Contributions, questions, and issues are welcome! Please file issues at https://gitlab.com/tekne/pour, and contact the author at jad.ghalayini@gtc.ox.ac.uk for any other queries.

Dependencies

~1.5–2MB
~37K SLoC