Solved – How to increasing the dimension increase the variance without increasing the bias in kNN

biask nearest neighbourvariance

My question is about understanding Figure 2.8 in The Elements of Statistical Learning (2nd edition). The topic of the section is how increasing dimension influence the bias/variance.

I can roughly understand Figure 2.7 in ESL, but have no idea about 2.8. Any explanation on the roughly unchanged bias, or the dominating variance? I cannot imagine how they change when the dimension is increasing.

Following is the detail:

Suppose we have 1000 training examples $x_i$ generated uniformly on $[-1,1]^p$. Assume that the true relationship between $X$ and $Y$ (capital letters for variables) is
$$
Y=F(X)=\frac12(X_1 + 1)^3
$$
where $X_1$ denotes the first component of $X$ ($X$ has totally $p$ components, in other words, features). We use the 1-nearest-neighbor rule to predict $y_0$ at the test-point $x_0 = 0$. Denote the training set by $\mathcal{T}$. We can compute the expected prediction error at $x_0$ for our procedure, averaging over all such samples of size 1000. This is the mean squared error (MSE) for estimating $f(0)$:

\begin{align}
\operatorname{MSE}(x_0) &=E_{\mathcal{T}}[f(x_0)-\hat{y}_0]^2 \\
&= E_{\mathcal{T}}[\hat{y}_0-E_{\mathcal{T}}(\hat{y}_0)]^2 +
[E_{\mathcal{T}}(\hat{y}_0)-f(x_0)]^2 \\
&= \operatorname{Var}_{\mathcal{T}}(\hat{y}_0) + \operatorname{Bias}^2(\hat{y}_0)
\end{align}

The figure is below. The right plot is the case with increasing $p$ (dimension).

Figure 2.8 in ESL

Best Answer

First of all, the bias of a classifier is the discrepancy between its averaged estimated and true function, wheras the variance of a classifier is the expected divergence of the estimated prediction function from its average value (i.e. how dependent the classifier is on the random sampling made in the training set).

Hence, the presence of bias indicates something basically wrong with the model, whereas variance is also bad, but a model with high variance could at least predict well on average.

The key to understand examples generating Figures 2.7 and 2.8 is:

The variance is due to the sampling variance of the 1-nearest neighbor. In low dimensions and with $N = 1000$, the nearest neighbor is very close to $0$, and so both the bias and variance are small. As the dimension $p$ increases, the nearest neighbor tends to stray further from the target point, and both bias and variance are incurred. By $p = 10$, for more than $99\%$ of the samples the nearest neighbor is a distance greater than $0.5$ from the origin.

Recall the the target function of the example generating Figure 2.7 depends on $p$ variables, and hence the MSE error is largely due to the bias.

Conversely, in Figure 2.8 the target function of the example depends only on $1$ variable, and thus the variance dominates. More generally, this happens when you are dealing with low dimensions.

I hope this could help.

Related Question