Solved – How to: Prediction intervals for linear regression via bootstrapping

bootstrapprediction intervalregression

I am having trouble to understand how to use bootstrapping to calculate prediction intervals for a linear regression model. Can somebody outline a step-by-step procedure? I searched via google but nothing really makes sense to me.

I do understand how to use bootstrapping for calculating confidence intervals for the model parameters.

Best Answer

Confidence intervals take account of the estimation uncertainty. Prediction intervals add to this the fundamental uncertainty. R's predict.lm will give you the prediction interval for a linear model. From there, all you have to do is run it repeatedly on bootstrapped samples.

n <- 100
n.bs <- 30

dat <- data.frame( x<-runif(n), y=x+runif(n) )
plot(y~x,data=dat)


regressAndPredict <- function( dat ) {
  model <- lm( y~x, data=dat )
  predict( model, interval="prediction" )
}

regressAndPredict(dat)

replicate( n.bs, regressAndPredict(dat[ sample(seq(n),replace=TRUE) ,]) )

The result of replicate is a 3-dimensional array (n x 3 x n.bs). The length 3 dimension consists of the fitted value for each data element, and the lower/upper bounds of the 95% prediction interval.

Gary King method

Depending on what you want, there's a cool method by King, Tomz, and Wittenberg. It's relatively easy to implement, and avoids the problems of bootstrapping for certain estimates (e.g. max(Y)).

I'll quote from his definition of fundamental uncertainty here, since it's reasonably nice:

A second form of variability, the fundamental un- certainty represented by the stochastic component (the distribution f ) in Equation 1, results from innumerable chance events such as weather or illness that may influ- ence Y but are not included in X. Even if we knew the ex- act values of the parameters (thereby eliminating esti- mation uncertainty), fundamental uncertainty would prevent us from predicting Y without error.

Related Question