Solved – How does ggplot compute confidence intervals for regressions

confidence intervalggplot2rregression

The R plotting package ggplot2 has an awesome function called stat_smooth for plotting a regression line (or curve) with the associated confidence band.

However I am having a hard time figuring out exactly how this confidence band is generated, for every time of regression line (or "method"). How can I find this information?

Best Answer

From the Details section of the help

Calculation is performed by the (currently undocumented) predictdf generic function and its methods. For most methods the confidence bounds are computed using the predict method - the exceptions are loess which uses a t-based approximation, and for glm where the normal confidence interval is constructed on the link scale, and then back-transformed to the response scale.

So predictdf will generally call stats::predict, which in turn will call the correct predict method for the smoothing method. Other functions involving stat_smooth are also useful to consider.

Most model fitting functions will have predict method associated with the class of the model. These will usually take a newdata object and an argument se.fit that will denote whether the standard errors will be fitted. (see ?predict) for further details.

se
display confidence interval around smooth? (TRUE by default, see level to control

This is passed directy to the predict method to return the appropriate standard errors (method dependant)

fullrange
should the fit span the full range of the plot, or just the data

This defines the newdata values for x at which the predictions will be evaluated

level level of confidence interval to use (0.95 by default)

Passed directly to the predict method so that the confidence interval can define the appropriate critical value (eg predict.lm uses qt((1 - level)/2, df) for the standard errors to be multiplied by

n number of points to evaluate smoother at

Used in conjunction with fullrange to define the x values in the newdata object.

Within a call to stat_smooth you can define se which is what is partially matched to se.fit (or se), and will define the interval argument if necessary. level will give level of the confidence interval (defaults 0.95).

The newdata object is defined within the processing, depending on your setting of fullrange to a sequence of length n within the full range of the plot or the data.

In your case, using rlm, this will use predict.rlm, which is defined as

predict.rlm <- function (object, newdata = NULL, scale = NULL, ...)
{
    ## problems with using predict.lm are the scale and
    ## the QR decomp which has been done on down-weighted values.
    object$qr <- qr(sqrt(object$weights) * object$x)
        predict.lm(object, newdata = newdata, scale = object$s, ...)
}

So it is internally calling predict.lm with an appropriate scaling of the qr decomposition and scale argument.

Related Question