Solved – Points Outside Linear Regression Confidence Band

confidence intervalprediction intervalregression

I have done a linear regression of predicted measurements (of my model) vs. observed measurements and plotted the confidence band. Can I draw any conclusions about the points that lie outside the band?

If I am interpreting confidence bands correctly, if a point does not lie within the confidence band it means that there is 95% chance that its not within the range of the mean predicted value for that specific $x$ value (observed measurement) and nothing else (I cannot say anymore).

Best Answer

No, you essentially cannot infer anything from a data point lying outside the confidence band.

I think your interpretation of the confidence and the prediction bands may be off.

  • The 95% confidence band is a band that contains the true unknown mean response for a particular predictor value 95% of the time if you were to repeat your experiment many, many times.
  • The 95% prediction band is a band that contains 95% of future observable realizations if you were to repeat your experiment many, many times.

Note the difference: the confidence band applies to unobservable parameter estimates, the prediction band to observables. The confidence band only includes uncertainty in estimating the mean; the prediction band includes both this uncertainty and residual variation around this mean. You may want to look at the tag wikis for and .

Here is an illustration. Note how the confidence band gets smaller as we increase the number $n$ of observations, because we can estimate the mean more and more precisely. Conversely, the prediction band does get smaller, but not so much, because while the uncertainty around estimating the mean gets smaller, the residual variation stays the same.

confidence vs prediction

opar <- par(mfrow=c(2,2))
    for ( nn in c(20,80) ) {
        set.seed(1)

        xx <- sort(runif(nn,-1,1))
        yy <- 0.5*xx+0.2*rnorm(nn)
        model <- lm(yy~xx)
        conf <- predict(model,interval="confidence")
        pred <- predict(model,interval="prediction")

        plot(xx,yy,type="n",ylim=c(-1,1),main=paste(nn,"data points, with confidence band"))
        polygon(c(xx,rev(xx)),c(conf[,"lwr"],rev(conf[,"upr"])),col="lightgrey",border=NA)
        abline(model)
        points(xx,yy,pch=19)

        plot(xx,yy,type="n",ylim=c(-1,1),main=paste(nn,"data points, with prediction band"))
        polygon(c(xx,rev(xx)),c(pred[,"lwr"],rev(pred[,"upr"])),col="darkgrey",border=NA)
        abline(model)
        points(xx,yy,pch=19)
    }
par(opar)

(Note that you'd typically not use a prediction interval for those observations you trained your model on, and R rightly complains about this. Conversely, looking at the confidence interval in-sample makes perfect sense.)