Machine Learning – When is Nearest Neighbor Meaningful Today?

k nearest neighbourmachine learning

In 1999, Beyer et al. asked,
When is "Nearest Neighbor" meaningful?

Are there better ways of analyzing and visualizing
the effect of distance flatness on NN search since 1999?

Does [a given] data set provide meaningful answers to the 1-NN problem?
The 10-NN problem? The 100-NN problem?

How would you experts approach this question today?


Edits Monday 24 Jan:

How about "distance whiteout" as a shorter name for "distance flatness
with increasing dimension" ?

An easy way to look at "distance whiteout"
is to run 2-NN,
and plot distances to the nearest neighbor and second-nearest neighbors.
The plot below shows dist1 and dist2
for a range of nclusters and dimensions, by Monte Carlo.
This example shows pretty good distance contrast
for the scaled absolute difference |dist2 – dist1|.
(The relative differences |dist2 / dist1|
→ 1 as dimension → ∞, so become useless.)

Whether absolute errors or relative errors should be used
in a given context
depends of course on the "real" noise present: difficult.

Suggestion: always run 2-NN;
2 neighbors are useful when they're close, and useful when not.

enter image description here

Best Answer

I don't have a full answer to this question, but I can give a partial answer on some of the analytical aspects. Warning: I've been working on other problems since the first paper below, so it's very likely there is other good stuff out there I'm not aware of.

First I think it's worth noting that despite the title of their paper "When is `nearest neighbor' meaningful", Beyer et al actually answered a different question, namely when is NN not meaningful. We proved the converse to their theorem, under some additional mild assumptions on the size of the sample, in When Is 'Nearest Neighbor' Meaningful: A Converse Theorem and Implications. Journal of Complexity, 25(4), August 2009, pp 385-397. and showed that there are situations when (in theory) the concentration of distances will not arise (we give examples, but in essence the number of non-noise features needs to grow with the dimensionality so of course they seldom arise in practice). The references 1 and 7 cited in our paper give some examples of ways in which the distance concentration can be mitigated in practice.

A paper by my supervisor, Ata Kaban, looks at whether these distance concentration issues persist despite applying dimensionality reduction techniques in On the Distance Concentration Awareness of Certain Data Reduction Techniques. Pattern Recognition. Vol. 44, Issue 2, Feb 2011, pp.265-277.. There's some nice discussion in there too.

A recent paper by Radovanovic et al Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data. JMLR, 11(Sep), September 2010, pp:2487−2531. discusses the issue of "hubness", that is when a small subset of points belong to the $k$ nearest neighbours of many of the labelled observations. See also the first author's PhD thesis, which is on the web.

Related Question