Solved – Calculating Prediction Interval

confidence intervalprediction intervalrregression

I have the following data located here. I am attempting to calculate the 95% confidence interval on the mean purity when the hydrocarbon percentage is 1.0. In R, I enter the following.

> predict(purity.lm, newdata=list(hydro=1.0), interval="confidence", level=.95)
   fit      lwr      upr
1 89.66431 87.51017 91.81845

However, how can I derive this result myself? I attempted to use the following equation.

$$s_{new}=\sqrt{s^2\left(1+\frac{1}{N}+\frac{(x_{new}-\bar x)^2}{\sum(x_i-\bar x)^2}\right)}$$

And I enter the following in R.

> SSE_line = sum((purity - (77.863 + 11.801*hydro))^2)
> MSE = SSE_line/18
> t.quantiles <- qt(c(.025, .975), 18)
> prediction = B0 + B1*1
> SE_predict = sqrt(MSE)*sqrt(1+1/20+(mean(hydro)-1)^2/sum((hydro - mean(hydro))^2))
> prediction + SE_predict*t.quantiles
[1] 81.80716 97.52146

My results are different from R's predict function. What am I misunderstanding about prediction intervals?

Best Answer

Your predict.lm code is calculating confidence intervals for the fitted values. Your hand calculation is calculating prediction intervals for new data. If you want to get the same result from predict.lm that you got from the hand calculation then change interval="confidence" to interval="prediction"