MSE Formula – How to Find the Correct Calculation in Linear Regression

linearregression

Throughout my student life so far, I have always considered the mean squared error to be calculated by $ MSE=\frac{1}{n}\sum(Y_i-\hat{Y}_i)^2$. However I was looking at one of my statistics mod today and it was stated in the slide that

enter image description here

And that would mean that $ MSE=\frac{1}{n-2}\sum(Y_i-\hat{Y}_i)^2$ since $ SSE=\sum(Y_i-\hat{Y}_i)^2$.

Upon researching on this, I found this description on wikipedia:

mean squared error is sometimes used to refer to the unbiased estimate of error variance: the residual sum of squares divided by the number of degrees of freedom. This definition for a known, computed quantity differs from the above definition for the computed MSE of a predictor, in that a different denominator is used.

I would like to know if there is a correct definition or are the 2 MSEs here actually referring to completely different concepts? How do I go about understanding the reason for the difference?

Best Answer

Assuming that the slide is talking about linear regression with one input variable, i.e. $$y_i = \beta_0 + \beta_1 x_i + \varepsilon_i$$, the correct formula for MSE is: $$ \operatorname{MSE} = \frac{1}{n-2} \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 \ . $$ To reiterate, for the specific case of a linear model with only one input variable the denominator must be $n-2$.

In the more general case when you have a linear model with $k$ input variables that is: $$ y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \dots + \beta_k x_{ki} + \varepsilon_i \ , $$ then the MSE would be: $$ \operatorname{MSE} = \frac{1}{n-(k+1)} \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 \ . $$

I am not aware of any model in which the denominator would be $n$. Usually, the denominator of $n$ is only possible when we know the population parameters $\beta_j$, in which case we are computing the true residual variance not estimating the residual variance.