Solved – How is R calculating prediction intervals for the mean response at x0? Manual calculation yields different results

confidence intervalrregression

EDIT: It appears that for this particular problem I made a simple arithmetic error. I am now getting the same results. What prompted me to write this question was that this is the third time that I've had this problem. However, in the past, I was mistakenly using interval="predict" when I should have been using interval="confidence".

I am trying to compute a 95% confidence interval for a mean response on a small dataset, yet when I calculate this manually I get a very different interval. How is R calculating the interval when using predict.lm? Am I using the wrong function call? I verified that my manual calculation is correct with the solutions manual.

Computing a 95% confidence interval for a mean response at $x_0 = 170$ for the following dataset:

      x        y
1   170   162.50
2   140   144.00
3   180   147.50
4   160   163.50
5   170   192.00
6   150   171.75
7   170   162.00
8   110   104.83
9   120   105.67
10  130   117.58
11  120   140.25
12  140   150.17
13  160   165.17

I calculated the confidence interval using the following equation from my book:
$$
\hat\mu_{y|x_0}\pm t_{\alpha/2, n-2}\sqrt{MS_{\rm res}\left(\frac 1 n + \frac{(x_0-\bar x)^2}{S_{xx}} \right)}
$$
And I got an interval of $168.36\pm23.465$ (or alternatively, $(144.895, 191.825)$). When I did this manual calculation, the MSRes I calculated is the same as the output from calling anova(model). Additionally, I calculated Sxx in R as follows:

# Results are equivalent:
sum((data$x - mean(data$x))^2) # = 6230
(length(data$x)-1)*var(data$x) # = 6230

However, when I calculate this in R, I get the same fitted value but a different interval:

> predict.lm(data.lm, newdata=data.frame(x=170), interval="confidence", level=0.95)
       fit      lwr      upr
1 168.3611 153.9071 182.8152

Why is R giving me an incorrect interval? Or is my textbook simply wrong? This also applies to a 95% prediction interval on the same $x_0$, using the following equation:
$$
\hat y_0 \pm t_{\alpha/2, n-2}\sqrt{MS_{\rm res}\left(1 + \frac 1 n + \frac{(x_0-\bar x)^2}{S_{xx}} \right)}
$$

Best Answer

It's hard to guess what you might have done wrong since Sxx and MSres are correct. Perhaps the t-value? Here is the calculation done by hand in R to confirm that the R answer agrees with the formula:

data <- structure(list(x = c(170, 140, 180, 160, 170, 150, 170, 110, 
120, 130, 120, 140, 160), y = c(162.5, 144, 147.5, 163.5, 192, 
171.75, 162, 104.83, 105.67, 117.58, 140.25, 150.17, 165.17)), .Names = c("x", 
"y"), row.names = c(NA, -13L), class = "data.frame")

data.lm <- lm(y~x, data=data)

# get prediction
pred <- predict.lm(data.lm, newdata=data.frame(x=170), interval="confidence", level=0.95)

# correct t-value
tval <- qt((1-0.95)/2, df=13-2)

# correct Sxx
Sxx <- sum((data$x - mean(data$x))^2)

# correct MSres - note division is by the number of df (but you have this right anyway)
MSres <- sum(data.lm$residuals^2)/11

# confidence interval calculated by hand
sqrt(MSres * (1/13 + (170 - mean(x))^2/Sxx)) * tval * c(1, -1) + pred[1]

# compare with R interval calculation
pred
Related Question