Solved – error rates of knn minimal for k=1

errork nearest neighbour

I am trying to find the best parameter $k$ for a nearest neighbour classifier using cross validation for some datasets. After computing and plotting the error rates, I noticed some strange behaviour for one of my datasets.

The bias-variance decomposition roughly says a model will not perform well on new data if it is too complex or too simple. Therefore I would expect that the error rates decrease if $k$ increases and from a certain $k=a$ onwards, it would be vice versa (i.e. error rates increase again as $k$ decreases). If the error rate would be plotted in function of $k$, the plot should somehow look like a smiling smiley.

The plot of error rates for my particular dataset now turns out to show exactly the opposite. It is a sad smiley with minimal error rate for $k = 3$. The problem is that I don't seem to be able to understand what is going on there. The dataset doesn't seem to be that special either, it might even allow some linear classification. Therefore I find it so strange that such a complex model seems to be the best despite the cross-validation.

Could anybody help me explain what exactly might be going on there?

Best Answer

The data turned out to be generated by some cosine-alike function with a low density. This caused the nearest neighbours classifier to perform well for small $k$ (it matches the cosine, because it takes into account the points that were far into the "other-class-zone") and for large $k$ (it matches an almost linear function because it ignores ALL points that are on the "other side"). Everything in between confuses the classifier, because depending on which points are training and which are test data, the classification border alters a lot. This causes the variance and the bias to be higher for a moderate $k$ than for extreme low/high $k$...