[Math] Expected Prediction Error (EPE) a function of

descriptive statisticsstatistical-inferencestatisticsterminology

Context

I am self-studying Elements of Statistical Learning (2nd ed), by Friedman, Hastie & Tibshirani. I have a question with regards to what the EPE, as defined in this text, is a function of. Namely, on page 10 (equation 2.9), EPE is defined as:

$\text{EPE}(f) = \text{E}_{X, Y}(Y – f(X))^2$

This implies that EPE is a function of the learner, $f$, and that it is an expectation over the joint distribution of $X$ and $Y$.

I accept and understand this definition, since it distinguishes EPE from mean squared error (MSE), which is defined for some $x_0 \in \Omega_X$ as:

$\text{MSE}(x_0) = \text{E}_{\mathcal{T}}(y_0 – \hat{f}(x_0))^2$

Where $\mathcal{T}$ represents the training set, and $\hat{f}$ represents the corresponding learner. Note that the above equation assumes the relationship between $X$ and $Y$ is deterministic — $y_0$ is assumed to be a constant.

Question

Where my confusion arises is in the use of EPE on page 18 (equation 2.27). The context of its use is this: the relationship between $Y$ (the dependent variable) and $X$ (the independent variable) is assumed to be linear in $X$:

$Y = X^T\beta + \epsilon$, where $\epsilon \sim \mathcal{N}(0, \sigma^2)$ independently of $X$.

Then, for some arbitrary test point $x_0$, equation 2.27 is stated in the following manner:

$\text{EPE}(x_0) = \text{E}_{y_0|x_0}\text{E}_{\mathcal{T}}(y_0 – \hat{y}_0)^2$, where $\mathcal{T}$ is the training set.

My issue with the above is that I cannot see how the use of EPE in equation 2.27 is equivalent to that of equation 2.9; is there a way to show that they are? Or is the latter notation incorrect, and EPE is only a function of the learner?

Related question

I have just taken a look at the "related questions" after posting this one, and have found the following question. It appears to express a similar sentiment as mine. The accepted answer, as I understand it, seemingly implies the usage of EPE in equation 2.27 is inconsistent. If this is the case, and the two usages cannot be consolidated, I suppose this question can be taken down as a duplicate (? New here.).

Best Answer

The two usages of EPE are unlike each other; the latter usage of EPE is closer to that of an "enhanced MSE". Namely, in Equation 2.27 the expectation is of the difference between the predicted and true value of the dependent variable, conditional on $X=x_0$ and also over the distribution of T. The "conditional on $X=x_0$" part distinguishes it from MSE as I have understood it to be defined. It is needed here because the relationship between X and Y is not deterministic.

Related Question