Solved – Why is KNN not “model-based”

k nearest neighbourmachine learningmodel

ESL chapter 2.4 seems to classify linear regression as "model-based", because it assumes $f(x) \approx x\cdot\beta$, whereas no similar approximation is stated for k-nearest neighbors. But aren't both methods making assumptions about $f(x)$?

Later on in 2.4 it even says:

  • Least squares assumes $f(x)$ is well approximated by a globally linear function.
  • k-nearest neighbors assumes $f(x)$ is well approximated by a locally constant function.

The KNN assumption seems like it could also be formalized (though not sure if doing so would lead to the KNN algorithm in the way assuming $f$ is linear leads to linear regression).

So if KNN actually isn't model-based, why? Or am I misreading ESL?

Best Answer

It is quite hard to compare kNN and linear regression directly as they are very different things, however, I think the key point here is the difference between "modelling $f(x)$" and "having assumptions about $f(x)$".

When doing linear regression, one specifically models the $f(x)$, often something among the lines of $f(x) = \mathbf{wx} + \epsilon$ where $\epsilon$ is a Gaussian noise term. You can work it out that the maximum likelihood model is equivalent to the minimal sum-of-squares error model.

KNN, on the other hand, as your second point suggests, assumes that you could approximate that function by a locally constant function - some distance measure between the $x$-ses, without specifically modelling the whole distribution.

In other words, linear regression will often have a good idea of value of $f(x)$ for some unseen $x$ from just the value of the $x$, whereas kNN would need some other information (i.e. the k neighbours), to make predictions about $f(x)$, because the value of $x$, and just the value itself, will not give any information, as there is no model for $f(x)$.

EDIT: reiterating this below to re-express this clearer (see comments)

It is clear that both linear regression and nearest neighbour methods aim at predicting value of $y=f(x)$ for a new $x$. Now there are two approaches. Linear regression goes on by assuming that the data falls on a straight line (plus minus some noise), and therefore the value of y is equal to the value of $f(x)$ times the slope of the line. In other words, the linear expression models the data as a straight line.

Now nearest neighbour methods do not care about whether how the data looks like (doesn't model the data), that is, they do not care whether it is a line, a parabola, a circle, etc. All it assumes, is that $f(x_1)$ and $f(x_2)$ will be similar, if $x_1$ and $x_2$ are similar. Note that this assumption is roughly true for pretty much any model, including all the ones I mentioned above. However, a NN method could not tell how value of $f(x)$ is related to $x$ (whether it is a line, parabola, etc.), because it has no model of this relationship, it just assumes that it can be approximated by looking into near-points.

Related Question