2 releases
0.1.1 | Mar 9, 2024 |
---|---|
0.1.0 | Oct 11, 2020 |
#747 in Data structures
88KB
1K
SLoC
Rust-Crater
This is a data structures library, named after my C data structures library.
Rust has a lot more builtin data structures compared to C, so much of the stuff is unnecessary. Vectors, hashtables, binary heaps, and argument parsing are already provided.
There are even some very rare vector functions like select_nth_unstable
, nearly identical to my
vec_ith
+ vec_partition_with_median
: both of these provide a way to find the nth element based on some
comparison function in an unsorted vector in linear time, and then partition the vector so the nth element
is at the nth index, every element preceding it is <=
, and every element following it is >=
.
The difference is my vec_partition_with_median
is a three way partitioning function, meaning it separates
the vector into <
, =
, and >
regions, but this is actually overkill for KD Trees, which were the
primary focus here.
Currently, this library implements VERY generic KD Trees, Fibonacci heaps, and minmax heaps.
There are some good KD Tree libraries already available for rust, but none of them are sufficiently generic for my needs. For simple 2D/3D KD Trees with ints/floats, other libraries currently offer more functionality. For KD Trees that don't live in Cartesian space or have exotic scalar types, this library is likely a good choice.
Kiddo is a very popular KD tree library for Rust, it is very fast but it
also has some very extreme shortcomings (Only float coordinates are well supported, integral coordinates
have diminished functionality like not supporting immutable trees, not supporting n_nearest_within
,
and requiring obsequeous wrapping with Fixed
types; query bounds are EXCLUSIVE which is not documented,
and the query functionality is not as featureful as our k_closest
(only k closest OR all within are
supported generally, inner bounds are never supported, region bounds and queries are not supported),
points cannot be retrieved from the tree forcing data duplication, the tree cannot be used as a map without
manually creating a separate vector, there are several redundant generic arguments).
kd-tree is another very popular KD tree library. Unlike Kiddo
it
does not claim to be the fastest, and also unlike Kiddo
it appears to have a very usable and flexible
API. It supports generic point types, though not generic topologies, and it also has support for
popular performance libraries Rayon
and nalgebra
built in.
This crate is a good choice for priority queues if you don't
need the advanced flexibilty of this crate's Fib heaps or need a more widely adopted and supported library.
Our Fib heaps are pretty low level and best suited to implementing other libraries on top of, as they were
created with performing A*
on KD trees in mind.
Features
KD Trees can be made with an arbitrary topology (2D, 3D+, Cartesian, embedded on the surface of a sphere/torus/etc).
This is facillitated by having several traits associated with the KdTree
struct:
-
KdPoint
: Represents a point in the tree. Types implementing this trait only need to have two methodsKdPoint::cmp
: Given a layer in the tree, compare two points in that layer. For cartesian points, this is as simple as getting thelayer%dim
coordinate, but if the points are embedded in the surface of a sphere for example it could be more natural to compare based on angle iflayer
is odd and based on longitude iflayer
is even.KdPoint::sqdist
: Given two points, compute the squared distance between them.KdPoint
does not have any restrictions on the type of the coordinates of the points it represents; it doesn't even assume points are represented with coordinates. Instead, the only associated type isKdPoint::Distance
, which is returned byKdPoint::sqdist
and only must beOrd
.
-
KdRegion
: Represents the bounds of the entire tree or some subtree. Types implementing this trait only need to have four methods. They should be able to represent a single point, but it is not necessary to be able to represent an empty region (it's wrapped inOption
) and it's ok if the region overestimates its size as long as it does not underestimate its (eg it can return a lower number than the truth formin_sqdist
but not a higher number)KdRegion::split
: Given a region, a point in the region, and a layer in the tree, split the region into the two subregions defined by that point in that layer. For example, if the KD tree is 2D Cartesian, even layers might split it horizontally and odd layers vertically. In general terms, this might involve chopping the region with a line/plane/hyperplane, or just splitting a convex set into two convex parts. The outer bounds on the tree are stored explicitly, but the bounds of every subtree are calculated laxly during traversal (ie they will generally be overestimates).KdRegion::single_point
: Create a region from a single pointKdRegion::extend
: Extend a region so that in includes an additional point. Frequently, regions will be AABBs so this is insanely simple to implement, but it could be much more complicated in general.KdRegion::min_sqdist
: Return a number <= the minimum distance between a given point and this region. In particular, points inside the region, such as those added bysingle_point
andextend
, MUST return0
(or something <= every other return value). Generally, regions are convex and so linear combinations of interior points if applicable are in the region. Points not in the region should ideally return > 0 so that the search can be maximally pruned, but not > their actual distance or of course the search could be incorrect.
There are also default implementations of these provided for Cartesian KD trees in arbitrary dimension with
(almost) any coordinate type (must be Ord + NumRef + Clone
): CuPoint
and CuRegion
.
External Crates:
num-traits
: This crate is certainly no Numerical Prelude (Haskell), but it's an aboslute godsend for making
mathematically generic traits without having to put 200+ type bounds relating to overloading numeric operators
for different permutations of reference types. Basically impossible to overload these opertors without it.
rand
: It's extremely weird that C++ has a builtin rand library but Rust doesn't. Rust typically makes
much better decisions with what should and shouldn't enter the language. That said, this random number
crate is INSANELY powerful and good, even moreso than Quickcheck (Haskell). Overloading the Uniform
random distribution to generate CuPoint<T, N>
generically took like 30 seconds.
Dependencies
~335–460KB