Solved – Calculating Prediction Interval

confidence intervalprediction intervalrregression

I have the following data located here. I am attempting to calculate the 95% confidence interval on the mean purity when the hydrocarbon percentage is 1.0. In R, I enter the following.

> predict(purity.lm, newdata=list(hydro=1.0), interval="confidence", level=.95)
   fit      lwr      upr
1 89.66431 87.51017 91.81845

However, how can I derive this result myself? I attempted to use the following equation.

$$s_{new}=\sqrt{s^2\left(1+\frac{1}{N}+\frac{(x_{new}-\bar x)^2}{\sum(x_i-\bar x)^2}\right)}$$

And I enter the following in R.

> SSE_line = sum((purity - (77.863 + 11.801*hydro))^2)
> MSE = SSE_line/18
> t.quantiles <- qt(c(.025, .975), 18)
> prediction = B0 + B1*1
> SE_predict = sqrt(MSE)*sqrt(1+1/20+(mean(hydro)-1)^2/sum((hydro - mean(hydro))^2))
> prediction + SE_predict*t.quantiles
[1] 81.80716 97.52146

My results are different from R's predict function. What am I misunderstanding about prediction intervals?

Best Answer

Your predict.lm code is calculating confidence intervals for the fitted values. Your hand calculation is calculating prediction intervals for new data. If you want to get the same result from predict.lm that you got from the hand calculation then change interval="confidence" to interval="prediction"

Related Solutions

LME4 in R – Prediction Interval for Mixed Effects Model

This question and excellent exchange was the impetus for creating the predictInterval function in the merTools package. bootMer is the way to go, but for some problems it is not feasible computationally to generate bootstrapped refits of the whole model (in cases where the model is large).

In those cases, predictInterval is designed to use the arm::sim functions to generate distributions of parameters in the model and then to use those distributions to generate simulated values of the response given the newdata provided by the user. It's simple to use -- all you would need to do is:

library(merTools)
preds <- predictInterval(lme1, newdata = newDat, n.sims = 999)

You can specify a whole host of other values to predictInterval including setting the interval for the prediction intervals, choosing whether to report the mean or median of the distribution, and choosing whether or not to include the residual variance from the model.

It's not a full prediction interval because the variability of the theta parameters in the lmer object are not included, but all of the other variation is captured through this method, giving a pretty decent approximation.

Regression – Do Confidence and Prediction Intervals Shrink to a Point with Large Sample Sizes?

Recall that consistency means that the estimator converges in probability to the parameter. This means that all the distribution of the estimator is arbitrarily concentrated around the parameter.

If you construct a confidence interval based on a consistent estimator, is will thus shrink infinitesimally as the sample size grows.

Best Answer

Related Solutions

LME4 in R – Prediction Interval for Mixed Effects Model

Regression – Do Confidence and Prediction Intervals Shrink to a Point with Large Sample Sizes?

Related Question