#insert #element #sorting #delete #set #rotated #√n

rotated-array-set

An ordered set supporting O(1) rank and O(√n) inserts and deletes

2 releases

0.1.1 Jan 9, 2020
0.1.0 Jan 9, 2020

#1495 in Data structures

Apache-2.0

68KB
979 lines

rotated-array-set

A 2-Level Rotated Array Implemented in Rust

This repository contains implementations, unit tests, and benchmark code for the "2-level rotated array" structure, first published in Munro and Suwanda's 1979 paper "Implicit Data Structures for Fast Search and Update" (which also introduced the much better-known beap data structure). This structure is further developed and discussed in "Implicit Data Structures for the Dictionary Problem" (1983) and "Succinct Dynamic Data Structures" (2001). (The latter generalizes the idea to the dynamic array abstract data type, rather than a sorted array.)

The theoretical advantage of a 2-level rotated array over an ordinary sorted array is that it provides the same search performance (O(log n)), with much better insert and delete performance (O(√n), compared to O(n) for a sorted array), in exactly the same amount of space (i.e., no more than the data itself). (For the purely implicit structure, inserts and deletes are O(√n * log n), but can be reduced to O(√n) using O(√n) extra space, a negligible amount. This implementation follows this approach.) This O(√n) insert/delete performance is considerably worse than the O(log N) performance of a balanced tree (e.g., a red-black tree or B-tree), but the 2-level rotated array takes up a fraction of the space of a balanced tree (less than half the space of Rust's BTreeSet).

Here is a detailed description of the 2-level array data structure, from "A compact data structure for representing a dynamic multiset":

We use a two-level rotated array to store the elements, with additional pointers to point to the beginning of each rotated array. A rotated array is an arbitrary cyclic shift of a sorted array. In a two-level rotated array, the elements are stored in an array divided into ⌈√(2n) blocks, where the ith block is a rotated array of length i. All the elements in the ith block are less than or equal to any element in the i + 1st block. We store the starting position of each block (rotated array) explicitly. To search for an element e, we first perform a binary search to find two adjacent blocks where the first (last) occurrence of e can lie. We then perform a binary search in those two blocks to find the answer. Thus element-based searches can be supported in O(lg n) worst-case time. The locator of an element can be realized by recording the block in which the element is stored together with the offset from the beginning of the block. Given a locator, the locator to its predecessor or successor can be easily obtained in O(1) worst-case time. To insert an element e, we first find the location where e should be inserted (using upper-bound). We perform the insertion into the block by removing the last element of the rotated array, and shifting all the elements between the new element and the last element, one position forward. We then have to update every block following the block in which the insertion was made as follows: remove the last element of the current rotated array and insert the last element from the previous rotated array which now becomes the new first element of the rotated array. Also, we update the starting position of each block in O(1) time per block. Thus, insertions can be supported in O(√n) worst-case time. Deletions can be performed analogously.

In practice, this data structure suffers from a problem common to implicit structures in general (such as the binary heap and heapsort): it is memory-efficient without being particularly cache-efficient. That is, it uses only a small fraction of the data transferred by a cache miss or a page fault, and so it fails to realize the efficiencies implied by asymptotic analysis. That said, it still improves on the insert/delete performance of a plain sorted array by 1-3 orders of magnitude (although it is slower than Rust's BTreeSet by 2-3 orders of magnitude), so it may be a good choice where memory efficiency or indexing performance is critical but the insert/delete performance of an array is unacceptable. (Note that it is possible to augment a balanced tree with subtree size information to achieve O(log N) indexing [and weight-balanced trees already contain this information], but Rust does not have such a data structure in its standard library.)

A dynamic array implementation of the same data structure (roughly a drop-in replacement for Vec, except that it doesn't support deref to a slice) is available at https://github.com/senderista/rotated-vec.

This implementation is in Rust, and is benchmarked using the Criterion benchmark framework. Preliminary benchmarks are here.

Dependencies

~72KB