I have a multi-class classification problem, in which I'm using Scikit Learn's k nearest neighbour classifier, (5 classes), which means that an odd number for k won't prevent classification ties.
So how does Scikit Learn resolve ties in the k nearest neighbour classification? I can't seem find this anywhere in the internet.
I need this for an exam assignment, so quick answers, if possible with a source of your knowledge, is much appreciated 🙂
Best Answer
From the documentation for
KNeighborsClassifier
:To get exactly what happens, we'll have to look at the source. You can see that, in the unweighted case,
KNeighborsClassifier.predict
ends up callingscipy.stats.mode
, whose documentation saysSo, in the case of ties, the answer will be the class that happens to appear first in the set of neighbors.
Digging a little deeper, the used
neigh_ind
array is the result of calling thekneighbors
method, which (though the documentation doesn't say so) appears to return results in sorted order. So ties should be broken by choosing the class with the point closest to the query point, but this behavior isn't documented and I'm not 100% sure it always happens.