[Math] How to efficiently create balanced KD-Trees from a static set of points

algorithmsoptimizationrecursive algorithmstrees

From Wikipedia, KD-Trees:

Alternative algorithms for building a balanced k-d tree presort the data prior to building the tree. They then maintain the order of the presort during tree construction and hence eliminate the costly step of finding the median at each level of subdivision. [..] This algorithm presorts n points in each of k dimensions

I fail to understand how that is actually done. Consider this example:

Lets say I have 5 points in an array.

0 = (4, 7)
1 = (2, 9)
2 = (5, 4)
3 = (3, 6)
4 = (2, 1) 

I suppose we now want to create 2 sorted arrays, one by X and one by Y ?

Sort them by X     Sort them by Y
1 = (2, 9)         4 = (2, 1) 
4 = (2, 1)         2 = (5, 4)
3 = (3, 6)         3 = (3, 6)
0 = (4, 7)         0 = (4, 7)
2 = (5, 4)         1 = (2, 9)

We start creating the KD-Tree. Lets split by X axis, at the first step. We select the median of the points, 3 = (3, 6), so the Left and right trees will like this:

Left  (first half)         
1 = (2, 9)   
4 = (2, 1)     

Right  (second half)
0 = (4, 7)      
2 = (5, 4)  

Now we want to sort the Left and the Right trees by Y axis. How are we supposed to make use of the points that we sorted previously by Y axis ?

The only solution in my eyes is to re-sort the Left and Right trees respectively, by Y axis. What is Wikipedia talking about ?

Best Answer

If you want to make use of pre-sorted lists, the the idea is to use the fact that the median will be located at $i = floor(\frac{N}{2})$, where $N$ is the length of the list you're looking for.

The nice thing about splitting the list is that when we recurse, we can simply use the sub-lists from $[0, i)$ and $[i, N)$ to find the median again.

However, as you noticed, we need to be smart about splitting the two lists in the K-D tree. The general idea is if we pick a point $P = (x_0, y_0)$ to split with, we need to split both the $X$-sorted list with respect to $x_0$ and the $Y$ sorted list with respect to $x_0$. (We split both according to $x_0$, not one with $x_0$ and one with $y_0$)

In the given example,

0 = (4, 7)
1 = (2, 9)
2 = (5, 4)
3 = (3, 6)
4 = (2, 1) 

When we split with the point 4 = (2, 1), the

$X$ list splits as: (it splits at $x_0 = 2$)

1 = (2, 9)       
4 = (2, 1)
----------  
3 = (3, 6)
0 = (4, 7) 
2 = (5, 4)

Clearly, the two sub-lists are still sorted by $x$

the $Y$ list splits at $x_0 = 2$

4 = (2, 1) 
1 = (2, 9)
----------
2 = (5, 4)
3 = (3, 6)
0 = (4, 7)

Notice that the two subsections of the $Y$ are still sorted with respect to the y coordinate, but they were split along the x coordinate.

Implementing this is exactly like implementing the reverse of the merge operation from merge sort.

Now that you have the $X$ and $Y$ sublists for the recursion, you can continue building the K-D tree.

Related Question