Solved – “variance of residuals” versus estimated residual variance

residualsvariance

I was instructed on an assignment to "calculate variance of the residuals obtained from your fitted equation." It was a simple linear regression, so I thought "ok, it's just the sum of squared residuals divided by $(n – 2)$ since it lost two degrees of freedom from estimating the intercept and slope coefficient." Wrong. He didn't want me to estimate the residual variance. Instead, I was told that I was supposed to divide it by $(n – 1)$. I don't understand why this would be done.

Variance can only be calculated around a parameter, and it is the summed deviations from that (or those) parameters divided by the degrees of freedom resulting from the sample size and the constraints of the parameter. If we're descriptively calculating the variance of one variable in a single population, the parameter would be a mean, so the degrees of freedom would be $(n – 1)$. I understand that, and I understand why it's true. But if the parameter is a "fitted equation" referring to a simple linear model, I don't see any way around using two parameters and therefore having $(n – 2)$ degrees of freedom when discussion variance of the residuals.

Can someone enlighten me as to what I'm misunderstanding, and what the difference between "variance of residuals" and "estimated residual variance" are?

Best Answer

Unless there is some underlying lesson or instruction I am missing, I find your instructor's approach a bit silly here. When computing the "variance" of a sample of observed quantities, we are really trying to form an estimator for the underlying random variable is represents. In my view, it is therefore more sensible to view the statistic you computed (the unbiased error variance estimator) as the proper "variance" in this case.$^\dagger$ The statistic your instructor is suggesting is one that incorporates Bessel's correction for a standard IID sample, but the residuals are not a sample of this kind, and consequently the statistic he is proposing is not an unbiased estimator of anything useful here.

It is possible that your instructor wanted you to compute the "sample variance" of the residuals using the standard formula, perhaps for the purpose of stressing to you that this is not equivalent to the unbiased error variance estimator in this case. Perhaps he is trying to impart some lesson here about the differences between the unbiased variance estimator in the IID case versus the unbiased estimator in the regression model. In any case, you seem to understand the matter well, so don't sweat it if you were marked incorrectly.

$^\dagger$ In the comments, whuber points out that the "variance" of a sample of values is sometimes regarded as the sum of squares divided by $n$ --- this definition comes from the fact that it is the variance of the empirical distribution of the sample. I am somewhat in the minority in the statistical profession in regarding this as a poor definition of the "variance" of a sample. In any case, this is not what your instructor is referring to.

Related Solutions

Residual Plots – Why Plot Versus Fitted Values Not Observed Y Values?

By construction the error term in an OLS model is uncorrelated with the observed values of the X covariates. This will always be true for the observed data even if the model is yielding biased estimates that do not reflect the true values of a parameter because an assumption of the model is violated (like an omitted variable problem or a problem with reverse causality). The predicted values are entirely a function of these covariates so they are also uncorrelated with the error term. Thus, when you plot residuals against predicted values they should always look random because they are indeed uncorrelated by construction of the estimator. In contrast, it's entirely possible (and indeed probable) for a model's error term to be correlated with Y in practice. For example, with a dichotomous X variable the further the true Y is from either E(Y | X = 1) or E(Y | X = 0) then the larger the residual will be. Here is the same intuition with simulated data in R where we know the model is unbiased because we control the data generating process:

rm(list=ls())
set.seed(21391209)

trueSd <- 10
trueA <- 5
trueB <- as.matrix(c(3,5,-1,0))
sampleSize <- 100

# create independent x-values
x1 <- rnorm(n=sampleSize, mean = 0, sd = 4)
x2 <-  rnorm(n=sampleSize, mean = 5, sd = 10)
x3 <- 3 + x1 * 4 + x2 * 2 + rnorm(n=sampleSize, mean = 0, sd = 10)
x4 <- -50 + x1 * 7 + x2 * .5 + x3 * 2  + rnorm(n=sampleSize, mean = 0, sd = 20)
X = as.matrix(cbind(x1,x2,x3,x4))


# create dependent values according to a + bx + N(0,sd)
Y <-  trueA +  X %*%  trueB  +rnorm(n=sampleSize,mean=0,sd=trueSd)


df = as.data.frame(cbind(Y,X))
colnames(df) <- c("y", "x1", "x2", "x3", "x4")
ols = lm(y~x1+x2+x3+x4, data = df)
y_hat = predict(ols, df)
error = Y - y_hat
cor(y_hat, error) #Zero
cor(Y, error) #Not Zero

We get the same result of zero correlation with a biased model, for example if we omit x1.

ols2 = lm(y~x2+x3+x4, data = df)
y_hat2 = predict(ols2, df)
error2 = Y - y_hat2
cor(y_hat2, error2) #Still zero
cor(Y, error2) #Not Zero

R Confidence Intervals – How to Calculate Confidence Intervals for Variance of Residuals

You seem to have made a small mistake. The variance of residuals is $7854.5/15=523.63$ (you have divided twice).

Now, what you are looking for is distribution of the estimate of the variance of true errors ($\varepsilon$) so that you can construct a confidence interval for it.

First let $\boldsymbol{\varepsilon} \sim N(\mathbf{0},\sigma^2I)$. Now the estimate for $\sigma^2$ is (where $K$ is the number of paramters in the model):

\begin{align} s^2 \equiv \frac{\mathbf{e'e}}{n-K} = \frac{\boldsymbol{\varepsilon'}M\boldsymbol{\varepsilon}}{n-K} \end{align}

So for this you can use the following property that (taken directly from this wiki page)

If $Y$ is a vector of $k$ i.i.d. standard normal random variables and $A$ is a $k\times k$ symmetric, idempotent matrix with rank $k-n$, then the quadratic form $Y^{T}AY$ is chi-square distributed with $k-n$ degrees of freedom.

Since $M$ is idempotent and symmetric with rank $n-K$, it is clear that:

$$\Big(\frac{\boldsymbol{\varepsilon'}}{\sigma}\Big)M\Big(\frac{\boldsymbol{\varepsilon}}{\sigma}\Big) = \frac{n-K}{\sigma^2}s^2 \sim \chi^2_{(n-K)}$$

This can be used now to construct the confidence interval:

$$Pr\bigg(q_{\alpha/2}<\frac{n-K}{\sigma^2}s^2 < q_{1-\alpha/2}\bigg)=1-\alpha$$

$$Pr\bigg(\frac{n-K}{q_{1-\alpha/2}}s^2 < \sigma^2 < \frac{n-K}{q_{\alpha/2}}s^2 \bigg) = 1-\alpha$$

where $q_p$ is the $p^{th}$ quantile of $\chi^2_{(n-K)}$

Based on your data, we can use qchisq in R to get these quantilies.

# lower bound:
sum(salesmod$residuals^2)/qchisq(0.975,df=15)
[1] 285.7373

# upper_bound
sum(salesmod$residuals^2)/qchisq(0.025,df=15)
[1] 1254.277

Best Answer

Related Solutions

Residual Plots – Why Plot Versus Fitted Values Not Observed Y Values?

R Confidence Intervals – How to Calculate Confidence Intervals for Variance of Residuals

Related Question