I have the following data located here. I am attempting to calculate the 95% confidence interval on the mean purity when the hydrocarbon percentage is 1.0. In R, I enter the following.
> predict(purity.lm, newdata=list(hydro=1.0), interval="confidence", level=.95)
fit lwr upr
1 89.66431 87.51017 91.81845
However, how can I derive this result myself? I attempted to use the following equation.
$$s_{new}=\sqrt{s^2\left(1+\frac{1}{N}+\frac{(x_{new}-\bar x)^2}{\sum(x_i-\bar x)^2}\right)}$$
And I enter the following in R.
> SSE_line = sum((purity - (77.863 + 11.801*hydro))^2)
> MSE = SSE_line/18
> t.quantiles <- qt(c(.025, .975), 18)
> prediction = B0 + B1*1
> SE_predict = sqrt(MSE)*sqrt(1+1/20+(mean(hydro)-1)^2/sum((hydro - mean(hydro))^2))
> prediction + SE_predict*t.quantiles
[1] 81.80716 97.52146
My results are different from R's predict function. What am I misunderstanding about prediction intervals?
Best Answer
Your
predict.lm
code is calculating confidence intervals for the fitted values. Your hand calculation is calculating prediction intervals for new data. If you want to get the same result frompredict.lm
that you got from the hand calculation then changeinterval="confidence"
tointerval="prediction"