Solved – How to estimate the leafsize of the kd-tree

cartclusteringmachine learningscipyspatial

The kd-tree implementation proposed by the scipy python libray asks for the value of the leafsize parameter that is to say the maximum number of points a node can hold. It is by default set to 10.

Are there methods or ways to estimate the value of the leafsize parameter to better distribute the data and avoid having leaves nodes with a single point?

https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html

scipy.spatial.KDTree(data, leafsize=10)
#The number of points at which the algorithm switches over to brute-force. Has to be positive.

Best Answer

With this setting of 10, you should never have a leaf with a single point, unless your data set consists of exactly one point.

Because the splits are balanced in size, the previous level must have at more than 10 points. So the minimum size is 5, if you set the maximum to 10 (except if there are less than 5 data points total).