[Math] Expected test error in regression

I am unsure regarding the definition of the expected test error here. As far as I understand the definition it is the following.

In a linear model the relationship between the random response variable $Y_i$ and the predictor vector $x_{i}$ is assumed to be of the following form

$$ Y_i = x^T_{i}\beta + \epsilon_i $$

where $\epsilon_i$ has expected value zero and variance $\sigma^2$.
Let $\hat{\beta}$ be the least square estimator fitted by using a training data set $(x_1,y_1),…,(x_n,y_n)$. Now we obtain a new instance $(x,y)$ from the same source as the instances in the training data set. Of course the observation $y$ is again an observation on a random variable $Y$.

The expected test error according to above source is now:

$$ \mathbb{E}[(y-x^T\beta^*)^2] $$

The above source now claims that

$$ \mathbb{E}[(y-x^T\beta^*)^2] = \mathbb{E}[(y-x^T\beta)^2] + \mathbb{E}[(x^T\beta)^2-x^T\beta)^2] $$

and further

$$ \mathbb{E}[(y-x^T\beta)^2] = \sigma^2 $$

Now the last claim is not clear to me. What would have been clear to me is that

$$ \mathbb{E}[(Y-x^T\beta)^2] = \sigma^2 $$ where $Y$ is the random variable rather than the observation $y$ on that random variable. Hence the question that arises is whether the expected test error is

$$ \mathbb{E}[(Y-x^T\beta^*)^2] \quad \text{ rather than } \quad \mathbb{E}[(y-x^T\beta^*)^2]. $$ Hence the question is really about whether to use the random variable $Y$ in the expected error or the observation $y$ on this random variable.

[Math] Expected test error in regression

Best Answer

Related Question

Best Answer

Related Solutions

[Math] linear regression, expectation and mean squared error

Finite sample variance of OLS estimator for random regressor

Related Question