First of all, the bias of a classifier is the discrepancy between its averaged
estimated and true function, wheras the variance of a classifier is the expected divergence of the estimated prediction function from its average value (i.e. how dependent the classifier is on the random sampling made in the training set).
Hence, the presence of bias indicates something basically wrong with the model, whereas variance is also bad, but a model with high variance could at
least predict well on average.
The key to understand examples generating Figures 2.7 and 2.8 is:
The variance is due to the sampling variance of the 1-nearest
neighbor. In low dimensions and with $N = 1000$, the nearest neighbor
is very close to $0$, and so both the bias and variance are small. As
the dimension $p$ increases, the nearest neighbor tends to stray
further from the target point, and both bias and variance are
incurred. By $p = 10$, for more than $99\%$ of the samples the nearest
neighbor is a distance greater than $0.5$ from the origin.
Recall the the target function of the example generating Figure 2.7 depends on $p$ variables, and hence the MSE error is largely due to the bias.
Conversely, in Figure 2.8 the target function of the example depends only on $1$ variable, and thus the variance dominates. More generally, this happens when you are dealing with low dimensions.
I hope this could help.
The asymptotic distribution for the sample variance (in the general non-normal case) can be found in O'Neill (2014) (Result 14, p. 285). As others have pointed out in the comments to your question, the more general result can be obtained via a combination of the CLT and Slutsky's theorem, working on an expansion for the sample variance (the cited paper has the proof so you can see that technique).
The generalised asymptotic result is similar to the (exact) distribution for the normal case, except that the degrees-of-freedom parameter is affected by the kurtosis of the underlying distribution. Higher kurtosis in the underlying distribution leads to greater accuracy, since tail values are less rare; lower kurtosis leads to less accuracy, since tail values are more rare. As can be seen from Result 14 in the above-cited paper, the general case (with finite variance and kurtosis) has the asymptotic approximation:
$$\frac{S^2}{\sigma^2} \sim \frac{\chi^2 (DF_n)}{DF_n} \quad \quad \quad DF_n \equiv \frac{2 \sigma^4}{\mathbb{V}(S^2)} = \frac{2n}{\kappa - (n-3)/(n-1)},$$
where $\kappa$ is the kurtosis of the underlying distribution. In the case of a mesokurtic distribution (such as the normal distribution) you have $\kappa = 3$, which gives $DF_n = n-1$, which is the well-known distribution for the normal case. (You have accidentally squared this term in the equation in your question.) In the case of an underlying platykurtic (leptokurtic) distribution, the degrees-of-freedom is higher (lower) than in the normal case.
As you can see from the definition of the degrees-of-freedom parameter in this result, this parameter is formed from the underlying kurtosis through the variance of the sample variance. (The kurtosis affects the variance of the sample variance, so that is why it enters into this analysis.) The degrees-of-freedom parameter is adjusted to ensure that the variance of the chi-squared distribution matches the true variance of the sampling statistic.
Best Answer
Standard deviations of averages are smaller than standard deviations of individual observations. [Here I will assume independent identically distributed observations with finite population variance; something similar can be said if you relax the first two conditions.]
It's a consequence of the simple fact that the standard deviation of the sum of two random variables is smaller than the sum of the standard deviations (it can only be equal when the two variables are perfectly correlated).
In fact, when you're dealing with uncorrelated random variables, we can say something more specific: the variance of a sum of variates is the sum of their variances.
This means that with $n$ independent (or even just uncorrelated) variates with the same distribution, the variance of the mean is the variance of an individual divided by the sample size.
Correspondingly with $n$ independent (or even just uncorrelated) variates with the same distribution, the standard deviation of their mean is the standard deviation of an individual divided by the square root of the sample size:
$\sigma_{\bar{X}}=\sigma/\sqrt{n}$.
So as you add more data, you get increasingly precise estimates of group means. A similar effect applies in regression problems.
Since we can get more precise estimates of averages by increasing the sample size, we are more easily able to tell apart means which are close together -- even though the distributions overlap quite a bit, by taking a large sample size we can still estimate their population means accurately enough to tell that they're not the same.