I have three variables:
- Number of house sales
- Month (in couples)
- Region of a city (N-W-E-S)
and I want to calculate confidence intervals for the residual of the errors. So, given the data:
month <- c("1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3",
"3", "4", "4", "4", "4", "5", "5", "5", "5", "6", "6", "6",
"6")
region <-c("1", "2", "3", "4", "1", "2", "3", "4", "1", "2", "3",
"4", "1", "2", "3", "4", "1", "2", "3", "4", "1", "2", "3",
"4")
sales <-c(85, 107, 61, 22, 40, 65, 58, 51, 60, 41, 45, 27, 15,
30, 68, 63, 28, 3, 57, 12, 36, 21, 10, 16)
data <- cbind(sales, month, region)
data <- as.data.frame(data)
salesmod <- lm(sales ~ month + region, data=data)
summary(salesmod)
anova(salesmod)
We can check the degrees of freedom and the sum of the residuals from the summary(salesmod) and anova(salesmod).
Call:
lm(formula = sales ~ month + region, data = data)
Residual standard error: 22.88 on 15 degrees of freedom
Multiple R-squared: 0.4855, Adjusted R-squared: 0.2111
F-statistic: 1.769 on 8 and 15 DF, p-value: 0.1623
> anova(salesmod)
Analysis of Variance Table
Response: sales
Df Sum Sq Mean Sq F value Pr(>F)
month 5 6368.7 1273.74 2.4325 0.0835 .
region 3 1042.8 347.60 0.6638 0.5871
Residuals 15 7854.5 523.63
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So the variance of the residuals would be $523.63/15=34.90867$, but how do I compute a confidence intervals for this value (of given 95% confidence).
Best Answer
You seem to have made a small mistake. The variance of residuals is $7854.5/15=523.63$ (you have divided twice).
Now, what you are looking for is distribution of the estimate of the variance of true errors ($\varepsilon$) so that you can construct a confidence interval for it.
First let $\boldsymbol{\varepsilon} \sim N(\mathbf{0},\sigma^2I)$. Now the estimate for $\sigma^2$ is (where $K$ is the number of paramters in the model):
\begin{align} s^2 \equiv \frac{\mathbf{e'e}}{n-K} = \frac{\boldsymbol{\varepsilon'}M\boldsymbol{\varepsilon}}{n-K} \end{align}
So for this you can use the following property that (taken directly from this wiki page)
Since $M$ is idempotent and symmetric with rank $n-K$, it is clear that:
$$\Big(\frac{\boldsymbol{\varepsilon'}}{\sigma}\Big)M\Big(\frac{\boldsymbol{\varepsilon}}{\sigma}\Big) = \frac{n-K}{\sigma^2}s^2 \sim \chi^2_{(n-K)}$$
This can be used now to construct the confidence interval:
$$Pr\bigg(q_{\alpha/2}<\frac{n-K}{\sigma^2}s^2 < q_{1-\alpha/2}\bigg)=1-\alpha$$
$$Pr\bigg(\frac{n-K}{q_{1-\alpha/2}}s^2 < \sigma^2 < \frac{n-K}{q_{\alpha/2}}s^2 \bigg) = 1-\alpha$$
where $q_p$ is the $p^{th}$ quantile of $\chi^2_{(n-K)}$
Based on your data, we can use
qchisq
in R to get these quantilies.