In-Sample and Training Error – Difference Between In-Sample Error and Training Error with Intuition of Optimism


In the book Elements of Statistical Learning in Chapter 7 (page 228), the training error is defined as:
\overline{err} = \frac{1}{N}\sum_{i=1}^{N}{L(y_i,\hat{f}(x_i))}

Whereas in-sample error is defined as
Err_{in} = \frac{1}{N}\sum_{i=1}^{N}{E_{Y^0}[L(Y_{i}^{0},\hat{f}(x_i))|\tau]}

The $Y^0$ notation indicates that we observe N new response values at
each of the training points $x_i, i = 1, 2, . . . ,N$.

Which seems to be exactly the same as training error because training error is also calculated i.e by computing the response of the training set using the fitted estimate $\hat{f}(x)$. I have checked this and this explanation of this concept, but could not understand the difference between training error and in-sample error, and why optimism is not always 0:
op\equiv Err_{in}-\overline{err}

So how are the errors $Err_{in}$ and $\overline{err}$ different, and what is the intuitive understanding of optimism in this context?

Additionally, what does the author mean by "usually biased downward" in the statement:

This is typically positive since err is usually biased downward as an estimate of prediction error.

while describing Optimism (Elements of Statistical Learning, page 229)

Best Answer

$Y^0$ in this setup has random part, e.g. with additive error $\varepsilon\sim N(0,\sigma_\varepsilon^2)$. So for fixed $(x,y)\in\mathcal{T}$, new response $Y^0$ to the predictor $x$ needs not to be the same as the corresponding training response $y$, hence the expectation $\operatorname{E}_{Y^0}$. "Biased downward" just means that $\overline{\mathrm{err}}$ is on average less than the true prediction error.

Related Question