Solved – How does Scikit Learn resolve ties in the KNN classification

classificationk nearest neighbourscikit learnself-study

I have a multi-class classification problem, in which I'm using Scikit Learn's k nearest neighbour classifier, (5 classes), which means that an odd number for k won't prevent classification ties.

So how does Scikit Learn resolve ties in the k nearest neighbour classification? I can't seem find this anywhere in the internet.

I need this for an exam assignment, so quick answers, if possible with a source of your knowledge, is much appreciated 🙂

Best Answer

From the documentation for KNeighborsClassifier:

Warning: Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but but different labels, the results will depend on the ordering of the training data.

To get exactly what happens, we'll have to look at the source. You can see that, in the unweighted case, KNeighborsClassifier.predict ends up calling scipy.stats.mode, whose documentation says

Returns an array of the modal (most common) value in the passed array.

If there is more than one such value, only the first is returned.

So, in the case of ties, the answer will be the class that happens to appear first in the set of neighbors.

Digging a little deeper, the used neigh_ind array is the result of calling the kneighbors method, which (though the documentation doesn't say so) appears to return results in sorted order. So ties should be broken by choosing the class with the point closest to the query point, but this behavior isn't documented and I'm not 100% sure it always happens.