|0.2.4||Mar 24, 2022|
|0.2.3||Jan 14, 2022|
|0.2.1||Nov 17, 2021|
|0.1.7||Nov 14, 2021|
|0.1.2||Apr 11, 2021|
#51 in Data structures
1,454 downloads per month
Used in 3 crates (2 directly)
A kd tree library that has high performance and const-generic dimensions. A fork of kdtree, with significant performance improvements and extra features. Thanks and kudos to mrhooray for the original kdtree library on which kiddo is based.
Differences vs firstname.lastname@example.org
The most significant structural difference is that kiddo has been written with the number of dimensions as a const generic parameter. This has a few benefits: many runtime errors (such as
WrongDimensionerrors) are now compile time errors as the dimensionality is known at compile time, requiring all methods that have a point as a parameter to be an array or slice of length K. Operations that previously required the use of
Vecs (such as the
distance_to_space()function) now operate on arrays/slices, eliminating costly heap allocations.
kiddo provides a specialised
nearest_one()method, for queries that need the nearest one element only. This method avoids any heap allocations, performing much faster than a call to
nearest()for a single point as a consequence.
kiddo extends kdtree's query API by adding two new query methods:
best_n_within_into_iter(). These are useful for performing queries such as "what are the tallest 10 mountains within 10 degrees of London", "which are the largest 100 settlements within 5 degrees of New York", or "find the brightest 100 stars within a 2 degree radius of this point on the sky". This requires your stored element type to implement
Ord, and for smaller values to be "better". Bringing this functionality inside of kiddo's implementation, rather than requiring an initial
within()query followed by a filter of the results, can be over 10x faster, as can be seen in the benchmarks below.
kdtree's within() function uses a
BinaryHeapto ensure that the results are ordered by distance from the query point. This sorting can be expensive, especially with a large number of elements. Kiddo's
within_unsorted()method returns items in arbitrary order. For use cases that don't need the response to be sorted, this is much faster.
Some small performance gains arise from using a technique used by some Python BinaryHeap libraries. Rather than
pop()ing and then immediately
push()ing to a
BinaryHeap, it is quicker in this scenario to swap the element at the of the top of the heap and then bubble the new element down.
The node structure has been refactored to use an
Enumfor aspects of the nodes that differ between stem and leaf nodes, rather than every node having all of these parameters present as
Options. This has two benefits. Firstly, stronger correctness guarantees. A type system as strong as Rust's allows us to eliminate the possibility of inconsistent state by design. Secondly, slightly better memory usage (also helped by using arrays rather than
Vecs for things such as node min/max bounds, possible because of the const generic dimensionality).
[dependencies] kiddo = "0.2.4"
Add points to kdtree and query nearest n points with distance function
use kiddo::KdTree; use kiddo::ErrorKind; use kiddo::distance::squared_euclidean; let a: ([f64; 2], usize) = ([0f64, 0f64], 0); let b: ([f64; 2], usize) = ([1f64, 1f64], 1); let c: ([f64; 2], usize) = ([2f64, 2f64], 2); let d: ([f64; 2], usize) = ([3f64, 3f64], 3); let mut kdtree = KdTree::new()?; kdtree.add(&a.0, a.1)?; kdtree.add(&b.0, b.1)?; kdtree.add(&c.0, c.1)?; kdtree.add(&d.0, d.1)?; assert_eq!(kdtree.size(), 4); assert_eq!( kdtree.nearest(&a.0, 0, &squared_euclidean).unwrap(), vec! ); assert_eq!( kdtree.nearest(&a.0, 1, &squared_euclidean).unwrap(), vec![(0f64, &0)] ); assert_eq!( kdtree.nearest(&a.0, 2, &squared_euclidean).unwrap(), vec![(0f64, &0), (2f64, &1)] ); assert_eq!( kdtree.nearest(&a.0, 3, &squared_euclidean).unwrap(), vec![(0f64, &0), (2f64, &1), (8f64, &2)] ); assert_eq!( kdtree.nearest(&a.0, 4, &squared_euclidean).unwrap(), vec![(0f64, &0), (2f64, &1), (8f64, &2), (18f64, &3)] ); assert_eq!( kdtree.nearest(&a.0, 5, &squared_euclidean).unwrap(), vec![(0f64, &0), (2f64, &1), (8f64, &2), (18f64, &3)] ); assert_eq!( kdtree.nearest(&b.0, 4, &squared_euclidean).unwrap(), vec![(0f64, &1), (2f64, &0), (2f64, &2), (8f64, &3)] );
Comparison with email@example.com
Criterion is used to perform a series of benchmarks. Each action is benchmarked against trees that contain 100, 1,000, 10,000, 100,000 and 1,000,000 nodes, and charted below.
Adding Items benchmarks are repeated against 2d, 3d and 4d trees. The 3d benchmarks are ran with points that are both of type
f32 and of type
All of the remaining tests are only performed against 3d trees, for expediency. The trees are populated with random source data whose points are all on a unit sphere. This use case is representative of common kd-tree usages in geospatial and astronomical contexts.
Nearest n Items tests query the tree for the nearest 1, 100 and 1,000 points at each tree size. The test for the common case of the nearest one point uses kiddo's
nearest_one() method, which is an optimised method for this specific common use case.
The results and charts below were created via the following process:
check out the original-kdtree-criterion branch. This branch is the same code as firstname.lastname@example.org, with criterion benchmarks added that perform the same operations as the criterion tests in kiddo. For functions that are present in kiddo but not in kdtree, the criterion tests for kdtree contain extra code to post-process the results from kdtree calls to perform the same actions as the new methods in kiddo.
use the following command to run the criterion benchmarks for kdtree and generate NDJSON encoded test results:
cargo criterion --message-format json > criterion-kdtree.ndjson
check out the master branch.
use the following command to run the criterion benchmarks for kiddo and generate NDJSON encoded test results:
cargo criterion --message-format json --all-features > criterion-kiddo.ndjson
- the graphs are generated in python using matplotlib. Ensure you have python installed, as well as the matplotlib and ndjdon python lbraries. Then run the following:
The following results were obtained with the above methodology on a machine with these specs:
- AMD Ryzen 5 2500X @ 3600MHz
- 32Gb DDR4 @ 3200MHz
The results are stored inside this repo as
criterion-kdtree.ndjson, should you wish
to perform your own analysis.
Adding items to the tree
Kiddo generally has a very small performance lead over email@example.com at larger tree sizes, with their performance being similar on smaller trees.
Retrieving the nearest n items
nearest_one() method gives a huge performance advantage for single item queries, with up to 9x faster performance.
nearest() method also outperforms firstname.lastname@example.org.
Retrieving all items within a distance, sorted
Things look closer here at first glance but the logarithmic nature of the charted data may obscure the fact that Kiddo is often up to twice as fast as email@example.com here.
Retrieving all items within a distance, unsorted
firstname.lastname@example.org does not have a
within_unsorted() method, so we are comparing kiddo's
within_unsorted() to email@example.com's
within() here, with kiddo up to 5x faster on the million-item tree.
Retrieving the best n items within a specified distance
Kiddo's performance advantage here ranges from twice as fast for hundred-item trees up to as much as 20x faster for trees with a million items.
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.