k-NN with k=1 – Does It Always Imply Overfitting?

classificationk nearest neighbournatural languageoverfitting

I found somewhere such statement, but on the other hand in some sources I found, that it is ok.

What about risk of overfitting while using 1-NN in binary classification problem where explanatory variables are TF-IDF values (cosine measure)?

Best Answer

The short answer to your title question is "No". Consider an example with a binary target variable which is perfectly separated by some value of the single explanatory variable to a large degree:

explan x Target

Clearly, 1-NN classification will work very well here and won't overfit. (The fact that there are other methods which will work equally well and may be simpler is irrelevant to the central point.)

TF-IDF values are outside my areas of expertise, but in general, writing loosely, the greater the separation between values of the target value in the space spanned by the explanatory values, the more effective 1-NN classification will be, regardless of the application area.

Related Question