EDIT: It appears that for this particular problem I made a simple arithmetic error. I am now getting the same results. What prompted me to write this question was that this is the third time that I've had this problem. However, in the past, I was mistakenly using interval="predict"
when I should have been using interval="confidence"
.
I am trying to compute a 95% confidence interval for a mean response on a small dataset, yet when I calculate this manually I get a very different interval. How is R calculating the interval when using predict.lm
? Am I using the wrong function call? I verified that my manual calculation is correct with the solutions manual.
Computing a 95% confidence interval for a mean response at $x_0 = 170$ for the following dataset:
x y
1 170 162.50
2 140 144.00
3 180 147.50
4 160 163.50
5 170 192.00
6 150 171.75
7 170 162.00
8 110 104.83
9 120 105.67
10 130 117.58
11 120 140.25
12 140 150.17
13 160 165.17
I calculated the confidence interval using the following equation from my book:
$$
\hat\mu_{y|x_0}\pm t_{\alpha/2, n-2}\sqrt{MS_{\rm res}\left(\frac 1 n + \frac{(x_0-\bar x)^2}{S_{xx}} \right)}
$$
And I got an interval of $168.36\pm23.465$ (or alternatively, $(144.895, 191.825)$). When I did this manual calculation, the MSRes I calculated is the same as the output from calling anova(model)
. Additionally, I calculated Sxx in R as follows:
# Results are equivalent:
sum((data$x - mean(data$x))^2) # = 6230
(length(data$x)-1)*var(data$x) # = 6230
However, when I calculate this in R, I get the same fitted value but a different interval:
> predict.lm(data.lm, newdata=data.frame(x=170), interval="confidence", level=0.95)
fit lwr upr
1 168.3611 153.9071 182.8152
Why is R giving me an incorrect interval? Or is my textbook simply wrong? This also applies to a 95% prediction interval on the same $x_0$, using the following equation:
$$
\hat y_0 \pm t_{\alpha/2, n-2}\sqrt{MS_{\rm res}\left(1 + \frac 1 n + \frac{(x_0-\bar x)^2}{S_{xx}} \right)}
$$
Best Answer
It's hard to guess what you might have done wrong since
Sxx
andMSres
are correct. Perhaps the t-value? Here is the calculation done by hand in R to confirm that the R answer agrees with the formula: