#knn #neighbor #ml #prediction #data-points

ball-tree

Ball-tree implementation for K-nearest neighbors

7 releases (4 breaking)

0.5.1 Sep 24, 2024
0.5.0 Mar 22, 2024
0.4.0 Apr 6, 2023
0.3.0 Jun 3, 2021
0.1.1 Nov 12, 2018

#308 in Data structures

Download history 215/week @ 2024-09-22 85/week @ 2024-09-29 78/week @ 2024-10-06 73/week @ 2024-10-13 25/week @ 2024-10-20 25/week @ 2024-10-27 76/week @ 2024-11-03 73/week @ 2024-11-10 157/week @ 2024-11-17 68/week @ 2024-11-24 51/week @ 2024-12-01 131/week @ 2024-12-08 192/week @ 2024-12-15 27/week @ 2024-12-22 69/week @ 2024-12-29 417/week @ 2025-01-05

712 downloads per month
Used in erdy

MIT license

25KB
414 lines

A BallTree is a space-partitioning data-structure that allows for finding nearest neighbors in logarithmic time.

It does this by partitioning data into a series of nested bounding spheres ("balls" in the literature). Spheres are used because it is trivial to compute the distance between a point and a sphere (distance to the sphere's center minus the radius). The key observation is that a potential neighbor is necessarily closer than all neighbors that are located inside of a bounding sphere that is farther than the aforementioned neighbor.

Graphically:


   A -  
   |  ----         distance(A, B) = 4
   |      - B      distance(A, S) = 6
    |       
     |
     |    S
       --------
     /        G \ 
    /   C        \
   |           D |
   |       F     |
    \ E         /
     \_________/

In the diagram, A is closer to B than to S, and because S bounds C, D, E, F, and G, it can be determined that A it is necessarily closer to B than the other points without even computing exact distances to them.

Ball trees are most commonly used as a form of predictive model where the points are features and each point is associated with a value or label. Thus, This implementation allows the user to associate a value with each point. If this functionality is unneeded, () can be used as a value.

This implementation returns the nearest neighbors, their distances, and their associated values. Returning the distances allows the user to perform some sort of weighted interpolation of the neighbors for predictive purposes.

No runtime deps