# Solved – Is the MSE of a vector a scalar or a matrix?

biasleast squareslinearmsevariance

Suppose $$Y = X\beta + \epsilon,$$ where $$Y$$ is $$n \times 1$$, $$X$$ is $$n \times p$$, and $$\beta$$ is $$p \times 1$$, and $$\epsilon$$ is $$n \times 1$$ with mean 0 and variance $$\sigma^2$$. The OLS estimator of $$\beta$$ is $$\hat{\beta} = (X^TX)^{-1}X^TY$$. Let $$\hat{Y} = X\hat{\beta}$$, then $$\hat{Y} = X(X^TX)^{-1}X^TY$$.

My question is, what is the MSE of $$\hat{Y}$$? Is it

$$\operatorname{MSE}(\hat{Y}) = \operatorname{E} \left [\left(\hat{Y}-X\beta\right)\left(\hat{Y}-X\beta\right)^T \right]$$ or

$$\operatorname{MSE}(\hat{Y}) = \operatorname{E} \left [\left(\hat{Y}-X\beta\right)^T\left(\hat{Y}-X\beta\right) \right]$$?

The former has dimension $$n \times n$$, and the latter has dimension $$1 \times 1$$.

The latter equation is closest to correct; $$MSE$$ is a scalar. (It is a little weird having $$\hat{Y}$$ inside the expectation since $$\hat{Y}=X\hat\beta$$.) That said, many people might ignore the expectation.
Therefore, it might be easier to remember that $$MSE = \frac{RSS}{df}$$ where RSS is the residual sum of squares $$RSS=(Y-X\hat\beta)^T(Y-X\hat\beta)$$
and the degrees of freedom is $$df=n-p$$.
Finally, one point of convention: We typically say there are $$p$$ covariates, not including the intercept. Thus the $$X$$ used in the model is typically $$n\times (p+1)$$. Hence we usually write that $$df=n-p-1$$.