Solved – Is the MSE of a vector a scalar or a matrix?

biasleast squareslinearmsevariance

Suppose $Y = X\beta + \epsilon,$ where $Y$ is $n \times 1$, $X$ is $n \times p$, and $\beta$ is $p \times 1$, and $\epsilon$ is $n \times 1$ with mean 0 and variance $\sigma^2$. The OLS estimator of $\beta$ is $\hat{\beta} = (X^TX)^{-1}X^TY$. Let $\hat{Y} = X\hat{\beta}$, then $\hat{Y} = X(X^TX)^{-1}X^TY$.

My question is, what is the MSE of $\hat{Y}$? Is it

$\operatorname{MSE}(\hat{Y}) = \operatorname{E} \left [\left(\hat{Y}-X\beta\right)\left(\hat{Y}-X\beta\right)^T \right] $ or

$\operatorname{MSE}(\hat{Y}) = \operatorname{E} \left [\left(\hat{Y}-X\beta\right)^T\left(\hat{Y}-X\beta\right) \right]$?

The former has dimension $n \times n$, and the latter has dimension $1 \times 1$.

Best Answer

The latter equation is closest to correct; $MSE$ is a scalar. (It is a little weird having $\hat{Y}$ inside the expectation since $\hat{Y}=X\hat\beta$.) That said, many people might ignore the expectation.

Therefore, it might be easier to remember that $MSE = \frac{RSS}{df}$ where RSS is the residual sum of squares $RSS=(Y-X\hat\beta)^T(Y-X\hat\beta)$

and the degrees of freedom is $df=n-p$.

Finally, one point of convention: We typically say there are $p$ covariates, not including the intercept. Thus the $X$ used in the model is typically $n\times (p+1)$. Hence we usually write that $df=n-p-1$.