Solved – Dealing with lots of ties in kNN model

k nearest neighbourr

I have a large data set (400k rows X 60 columns) that I'm trying to use to build a knn model. I'm using the caret package version of knn and the forward.search method from the FSelector package to eliminate variables via cross-validation. My problem is that once I use more than 20k lines of data I get a message about there being too many ties.

Currently I'm only checking k-values between 1-19 (and only odd #'s as they supposedly shrink risk of ties) and only using variables with > 2 levels.

Are there any other tweaks to using big chunks of data into a knn?

EDIT: This is regression problem, not a classification problem.

Best Answer

In some situation you have a lot of data items that are might be considered to be tied in distance, especially if your data is discrete (e.g. your matrix is made up of integers).

A "hack" that might be able to work is that you add a very small pseudo-random noise to the data. This will reduce the number of data items that happen to be equidistant. Note that the noise should be as small as possible so as to bias the results but large enough to reduce the ties.

Related Solutions

Solved – R knn variable selection

knnFit1$results is actually a data.frame, so you can print all of the R-squared results with:

knnFit1$results$Rsquared

Or the R-squared result for just the best model:

knnFit1.sorted <- results[order(results$Rsquared),]
knnFit1.sorted[1,'Rsquared']

Does this answer your question?

Solved – Dealing with ties, weights and voting in kNN

The ideal way to break a tie for a k nearest neighbor in my view would be to decrease k by 1 until you have broken the tie. This will always work regardless of the vote weighting scheme, since a tie is impossible when k = 1. If you were to increase k, pending your weighting scheme and number of categories, you would not be able to guarantee a tie break.

Best Answer

Related Solutions

Solved – R knn variable selection

Solved – Dealing with ties, weights and voting in kNN

Related Question