#kdtree #priority-queue #fibonacci-number #minmax-heap #fibonacci-heap

nightly crater

Very generic containers including KD trees, fibonacci heaps, minmax heaps,

2 releases

0.1.1 Mar 9, 2024
0.1.0 Oct 11, 2020

#747 in Data structures

MPL-2.0 license

88KB
1K SLoC

Rust-Crater

GitHub CI Documentation Coverage

This is a data structures library, named after my C data structures library.

Rust has a lot more builtin data structures compared to C, so much of the stuff is unnecessary. Vectors, hashtables, binary heaps, and argument parsing are already provided.

There are even some very rare vector functions like select_nth_unstable, nearly identical to my vec_ith + vec_partition_with_median: both of these provide a way to find the nth element based on some comparison function in an unsorted vector in linear time, and then partition the vector so the nth element is at the nth index, every element preceding it is <=, and every element following it is >=. The difference is my vec_partition_with_median is a three way partitioning function, meaning it separates the vector into <, =, and > regions, but this is actually overkill for KD Trees, which were the primary focus here.

Currently, this library implements VERY generic KD Trees, Fibonacci heaps, and minmax heaps.

There are some good KD Tree libraries already available for rust, but none of them are sufficiently generic for my needs. For simple 2D/3D KD Trees with ints/floats, other libraries currently offer more functionality. For KD Trees that don't live in Cartesian space or have exotic scalar types, this library is likely a good choice.

Kiddo is a very popular KD tree library for Rust, it is very fast but it also has some very extreme shortcomings (Only float coordinates are well supported, integral coordinates have diminished functionality like not supporting immutable trees, not supporting n_nearest_within, and requiring obsequeous wrapping with Fixed types; query bounds are EXCLUSIVE which is not documented, and the query functionality is not as featureful as our k_closest (only k closest OR all within are supported generally, inner bounds are never supported, region bounds and queries are not supported), points cannot be retrieved from the tree forcing data duplication, the tree cannot be used as a map without manually creating a separate vector, there are several redundant generic arguments).

kd-tree is another very popular KD tree library. Unlike Kiddo it does not claim to be the fastest, and also unlike Kiddo it appears to have a very usable and flexible API. It supports generic point types, though not generic topologies, and it also has support for popular performance libraries Rayon and nalgebra built in.

This crate is a good choice for priority queues if you don't need the advanced flexibilty of this crate's Fib heaps or need a more widely adopted and supported library. Our Fib heaps are pretty low level and best suited to implementing other libraries on top of, as they were created with performing A* on KD trees in mind.

Features

KD Trees can be made with an arbitrary topology (2D, 3D+, Cartesian, embedded on the surface of a sphere/torus/etc).

This is facillitated by having several traits associated with the KdTree struct:

  • KdPoint: Represents a point in the tree. Types implementing this trait only need to have two methods

    • KdPoint::cmp: Given a layer in the tree, compare two points in that layer. For cartesian points, this is as simple as getting the layer%dim coordinate, but if the points are embedded in the surface of a sphere for example it could be more natural to compare based on angle if layer is odd and based on longitude if layer is even.
    • KdPoint::sqdist: Given two points, compute the squared distance between them. KdPoint does not have any restrictions on the type of the coordinates of the points it represents; it doesn't even assume points are represented with coordinates. Instead, the only associated type is KdPoint::Distance, which is returned by KdPoint::sqdist and only must be Ord.
  • KdRegion: Represents the bounds of the entire tree or some subtree. Types implementing this trait only need to have four methods. They should be able to represent a single point, but it is not necessary to be able to represent an empty region (it's wrapped in Option) and it's ok if the region overestimates its size as long as it does not underestimate its (eg it can return a lower number than the truth for min_sqdist but not a higher number)

    • KdRegion::split: Given a region, a point in the region, and a layer in the tree, split the region into the two subregions defined by that point in that layer. For example, if the KD tree is 2D Cartesian, even layers might split it horizontally and odd layers vertically. In general terms, this might involve chopping the region with a line/plane/hyperplane, or just splitting a convex set into two convex parts. The outer bounds on the tree are stored explicitly, but the bounds of every subtree are calculated laxly during traversal (ie they will generally be overestimates).
    • KdRegion::single_point: Create a region from a single point
    • KdRegion::extend: Extend a region so that in includes an additional point. Frequently, regions will be AABBs so this is insanely simple to implement, but it could be much more complicated in general.
    • KdRegion::min_sqdist: Return a number <= the minimum distance between a given point and this region. In particular, points inside the region, such as those added by single_point and extend, MUST return 0 (or something <= every other return value). Generally, regions are convex and so linear combinations of interior points if applicable are in the region. Points not in the region should ideally return > 0 so that the search can be maximally pruned, but not > their actual distance or of course the search could be incorrect.

There are also default implementations of these provided for Cartesian KD trees in arbitrary dimension with (almost) any coordinate type (must be Ord + NumRef + Clone): CuPoint and CuRegion.

External Crates:

num-traits: This crate is certainly no Numerical Prelude (Haskell), but it's an aboslute godsend for making mathematically generic traits without having to put 200+ type bounds relating to overloading numeric operators for different permutations of reference types. Basically impossible to overload these opertors without it. rand: It's extremely weird that C++ has a builtin rand library but Rust doesn't. Rust typically makes much better decisions with what should and shouldn't enter the language. That said, this random number crate is INSANELY powerful and good, even moreso than Quickcheck (Haskell). Overloading the Uniform random distribution to generate CuPoint<T, N> generically took like 30 seconds.

Dependencies

~335–460KB