Solved – “variance of residuals” versus estimated residual variance

residualsvariance

I was instructed on an assignment to "calculate variance of the residuals obtained from your fitted equation." It was a simple linear regression, so I thought "ok, it's just the sum of squared residuals divided by $(n – 2)$ since it lost two degrees of freedom from estimating the intercept and slope coefficient." Wrong. He didn't want me to estimate the residual variance. Instead, I was told that I was supposed to divide it by $(n – 1)$. I don't understand why this would be done.

Variance can only be calculated around a parameter, and it is the summed deviations from that (or those) parameters divided by the degrees of freedom resulting from the sample size and the constraints of the parameter. If we're descriptively calculating the variance of one variable in a single population, the parameter would be a mean, so the degrees of freedom would be $(n – 1)$. I understand that, and I understand why it's true. But if the parameter is a "fitted equation" referring to a simple linear model, I don't see any way around using two parameters and therefore having $(n – 2)$ degrees of freedom when discussion variance of the residuals.

Can someone enlighten me as to what I'm misunderstanding, and what the difference between "variance of residuals" and "estimated residual variance" are?

Best Answer

Unless there is some underlying lesson or instruction I am missing, I find your instructor's approach a bit silly here. When computing the "variance" of a sample of observed quantities, we are really trying to form an estimator for the underlying random variable is represents. In my view, it is therefore more sensible to view the statistic you computed (the unbiased error variance estimator) as the proper "variance" in this case.$^\dagger$ The statistic your instructor is suggesting is one that incorporates Bessel's correction for a standard IID sample, but the residuals are not a sample of this kind, and consequently the statistic he is proposing is not an unbiased estimator of anything useful here.

It is possible that your instructor wanted you to compute the "sample variance" of the residuals using the standard formula, perhaps for the purpose of stressing to you that this is not equivalent to the unbiased error variance estimator in this case. Perhaps he is trying to impart some lesson here about the differences between the unbiased variance estimator in the IID case versus the unbiased estimator in the regression model. In any case, you seem to understand the matter well, so don't sweat it if you were marked incorrectly.


$^\dagger$ In the comments, whuber points out that the "variance" of a sample of values is sometimes regarded as the sum of squares divided by $n$ --- this definition comes from the fact that it is the variance of the empirical distribution of the sample. I am somewhat in the minority in the statistical profession in regarding this as a poor definition of the "variance" of a sample. In any case, this is not what your instructor is referring to.

Related Question