Solved – Finding weights for variables in kNN

k nearest neighbourmachine learningoptimizationr

I'm using euclidean distance for kNN. I have labeled data, I have took logarithm of some variables to make them look more like normaly distributed and scaled them all. And now I would like to multiply some variables by weights, then compute euclidean distance and train kNN. But how to find those weights ? My idea is to determine centers of classes this going to be set C, and then make optimization of kNN on set C by random search, I think that I can't do it on subset of training set, because it size would by to high or too small for accurate representation/sampling of dataset

Do you have any other ideas ?
I don't think that changing parameters k and l going to have the same approach as mine or mayby does it ?

Best Answer

Hastie and Tibshirani's paper on Discriminative Adaptive Nearest Neighbour Classification would be a good place to start.

A simple approach would be to choose the weights to minimise the leave-one-out error rate. However one of the advantages of kNN is that, being a relatively simple method, it is usually quite easy to avoid over-fitting (basically just need to choose k), and this advantage is easily lost if you try to tune the distance metric, so it may well make the performance of the model worse rather than better.