#queries #acyclic #programming #key #analytical #join #search

coppice

Dynamic programming library for acyclic analytical queries

1 unstable release

new 0.1.0 Jan 20, 2025

#137 in Caching

27 downloads per month

MIT license

39KB
904 lines

Coppice is a dynamic programming library for acyclic analytical queries expressed as nested map/reduce computations. These map/reduce computations are automatically cached and parallelised when executed with the [map_reduce()] higher-order function.

Of course, since we know we're only interested in full blown [map_reduce()] results, we actually memoise fully aggregated results for each "tablet" (small set of rows) and opaque params.

In addition to these standard inputs (and cache keys), Coppice also passes "join keys" to the "map" functions. This third type of inputs (in addition to rows and opaque params) enables Coppice to offer asymptotically superior performance compared to pure memoisation: [map_reduce()] essentially executes "map" functions "in reverse," for join keys.

The [map_reduce()] function thus ends up evaluating the "map" function for each row and params, but extracts all join keys that, combined with the row and params, yields a non-trivial Aggregate. We thus extract, for each row and params, a branching function from join keys to Aggregate, and reduce (merge::Merge) these branching functions together for all rows in a tablet.

Two key insights underlie Coppice.

The first is that backtracking search over join keys represented as bounded arrays of bits gives us enough finite domain powers to weakly emulate logic programming. That's not a lot, but enough to automatically build a bit-trie index from an opaque function.

The second is that many analytical queries use only hierarchical joins (a well known fact), and that writing these queries as regular code implicitly gives us the tree necessary for Yannakakis's algorithm (maybe less well known).

In short, the Coppice is just an API trick to coerce Rust covers into writing plain code that can be executed with a version of Yannakakis's algorithm simplified for the hierarchical subset of acyclic queries.

Dependencies

~1.3–2MB
~39K SLoC