Solved – R – Multivariate K-nearest neighbor outlier detection

k nearest neighbouroutliersr

I'm trying to implement the algorithm K-nearest neighbor to detect outlier from a multivariate dataset. I don't know how to do it. Could you provide me some example?

Best Answer

For 1NN outlier detection:

For each object:

  1. compute the distance to all other objects
  2. find the minimum (for larger k, choose the k smallest distance)
  3. store as outlier score

Usually k=1 to k=10 will be enough. See for example:

On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

They did an insane amount of experiments. But on most data sets, kNN with k=1 was one of the best methods of I recall correctly.