By construction the error term in an OLS model is uncorrelated with the observed values of the X covariates. This will always be true for the observed data even if the model is yielding biased estimates that do not reflect the true values of a parameter because an assumption of the model is violated (like an omitted variable problem or a problem with reverse causality). The predicted values are entirely a function of these covariates so they are also uncorrelated with the error term. Thus, when you plot residuals against predicted values they should always look random because they are indeed uncorrelated by construction of the estimator. In contrast, it's entirely possible (and indeed probable) for a model's error term to be correlated with Y in practice. For example, with a dichotomous X variable the further the true Y is from either E(Y | X = 1)
or E(Y | X = 0)
then the larger the residual will be. Here is the same intuition with simulated data in R where we know the model is unbiased because we control the data generating process:
rm(list=ls())
set.seed(21391209)
trueSd <- 10
trueA <- 5
trueB <- as.matrix(c(3,5,-1,0))
sampleSize <- 100
# create independent x-values
x1 <- rnorm(n=sampleSize, mean = 0, sd = 4)
x2 <- rnorm(n=sampleSize, mean = 5, sd = 10)
x3 <- 3 + x1 * 4 + x2 * 2 + rnorm(n=sampleSize, mean = 0, sd = 10)
x4 <- -50 + x1 * 7 + x2 * .5 + x3 * 2 + rnorm(n=sampleSize, mean = 0, sd = 20)
X = as.matrix(cbind(x1,x2,x3,x4))
# create dependent values according to a + bx + N(0,sd)
Y <- trueA + X %*% trueB +rnorm(n=sampleSize,mean=0,sd=trueSd)
df = as.data.frame(cbind(Y,X))
colnames(df) <- c("y", "x1", "x2", "x3", "x4")
ols = lm(y~x1+x2+x3+x4, data = df)
y_hat = predict(ols, df)
error = Y - y_hat
cor(y_hat, error) #Zero
cor(Y, error) #Not Zero
We get the same result of zero correlation with a biased model, for example if we omit x1.
ols2 = lm(y~x2+x3+x4, data = df)
y_hat2 = predict(ols2, df)
error2 = Y - y_hat2
cor(y_hat2, error2) #Still zero
cor(Y, error2) #Not Zero
You seem to have made a small mistake. The variance of residuals is $7854.5/15=523.63$ (you have divided twice).
Now, what you are looking for is distribution of the estimate of the variance of true errors ($\varepsilon$) so that you can construct a confidence interval for it.
First let $\boldsymbol{\varepsilon} \sim N(\mathbf{0},\sigma^2I)$. Now the estimate for $\sigma^2$ is (where $K$ is the number of paramters in the model):
\begin{align}
s^2 \equiv \frac{\mathbf{e'e}}{n-K} = \frac{\boldsymbol{\varepsilon'}M\boldsymbol{\varepsilon}}{n-K}
\end{align}
So for this you can use the following property that (taken directly from this wiki page)
If $Y$ is a vector of $k$ i.i.d. standard normal random variables and
$A$ is a $k\times k$ symmetric,
idempotent matrix with rank $k-n$,
then the quadratic form $Y^{T}AY$
is chi-square distributed with $k-n$
degrees of freedom.
Since $M$ is idempotent and symmetric with rank $n-K$, it is clear that:
$$\Big(\frac{\boldsymbol{\varepsilon'}}{\sigma}\Big)M\Big(\frac{\boldsymbol{\varepsilon}}{\sigma}\Big) = \frac{n-K}{\sigma^2}s^2 \sim \chi^2_{(n-K)}$$
This can be used now to construct the confidence interval:
$$Pr\bigg(q_{\alpha/2}<\frac{n-K}{\sigma^2}s^2 < q_{1-\alpha/2}\bigg)=1-\alpha$$
$$Pr\bigg(\frac{n-K}{q_{1-\alpha/2}}s^2 < \sigma^2 < \frac{n-K}{q_{\alpha/2}}s^2 \bigg) = 1-\alpha$$
where $q_p$ is the $p^{th}$ quantile of $\chi^2_{(n-K)}$
Based on your data, we can use qchisq
in R to get these quantilies.
# lower bound:
sum(salesmod$residuals^2)/qchisq(0.975,df=15)
[1] 285.7373
# upper_bound
sum(salesmod$residuals^2)/qchisq(0.025,df=15)
[1] 1254.277
Best Answer
Unless there is some underlying lesson or instruction I am missing, I find your instructor's approach a bit silly here. When computing the "variance" of a sample of observed quantities, we are really trying to form an estimator for the underlying random variable is represents. In my view, it is therefore more sensible to view the statistic you computed (the unbiased error variance estimator) as the proper "variance" in this case.$^\dagger$ The statistic your instructor is suggesting is one that incorporates Bessel's correction for a standard IID sample, but the residuals are not a sample of this kind, and consequently the statistic he is proposing is not an unbiased estimator of anything useful here.
It is possible that your instructor wanted you to compute the "sample variance" of the residuals using the standard formula, perhaps for the purpose of stressing to you that this is not equivalent to the unbiased error variance estimator in this case. Perhaps he is trying to impart some lesson here about the differences between the unbiased variance estimator in the IID case versus the unbiased estimator in the regression model. In any case, you seem to understand the matter well, so don't sweat it if you were marked incorrectly.
$^\dagger$ In the comments, whuber points out that the "variance" of a sample of values is sometimes regarded as the sum of squares divided by $n$ --- this definition comes from the fact that it is the variance of the empirical distribution of the sample. I am somewhat in the minority in the statistical profession in regarding this as a poor definition of the "variance" of a sample. In any case, this is not what your instructor is referring to.