Solved – When should a weighted KNN be used (or not)

k nearest neighbourmachine learning

By default, machine learning packages turn inverse distance weighting off for KNN. To me, it seems that inverse distance weighting is always a good option.

Why would we not want to use IDW with KNN? [And why would we want to?]

Best Answer

The idea is to be able to be more robust against variations in distances of the k-nearest neighbors which may lead to wrong decisions. It leads to smoother decision surfaces.

The assumption is that neighbors closets to the sample should be given more relevance when deciding by voting to which class the sample belongs, since they are more similar.

This is specially relevant for samples close to the decision surfaces, which are sensible to effects like noise or sampling differences among classes. It is not the only alternative to cope with such problems though. For example in this paper an alternative weighting for the neighbors is given, which, besides of being more effective for unbalanced data, it has a nice probabilistic interpretation.