Solved – K value vs Accuracy in KNN

accuracyk nearest neighbourmachine learningprecision-recall

am trying to learn KNN by working on Breast cancer dataset provided by UCI repository. The Total size of dataset is 699 with 9 continuous variables and 1 class variable.

I tested my accuracy on cross-validation set. For K =21 & K =19. Accuracy is 95.7%.

from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=21)
neigh.fit(X_train, y_train) 
y_pred_val = neigh.predict(X_val)
print accuracy_score(y_val, y_pred_val)

But for K= 1, I am getting Accuracy = 97.85% K = 3, Accuracy = 97.14

I read

Choice of k is very critical – A small value of k means that noise will have a higher influence on the result. A large value make it computationally expensive and kinda defeats the basic philosophy behind KNN (that points that are near might have similar densities or classes ) .A simple approach to select k is set k = n^(1/2). (Here.)

Which value of K should I consider for my model. Can you guys elaborate the logic behind it?

Thanks in advance!

Best Answer

Nikolas is right. The way to go about it is to do something like cross validation with different Ks, and chose the k that minimizes the cross validation error.

Related Question