Solved – Confidence interval for the slope of a GAM

generalized-additive-modelruncertainty

I need to fit a large number (thousands) of GAMs (using package mgcv in R with automated selection of smoothing terms, if that matters), and for each GAM I need to know whether the slope of the smooth term is negative over a certain range of my independent variable. Well that's easy: I just extract the fitted value at one end of the interval, and at the other end, and subtraction gives me my answer.

But does anybody have ideas for a (not-too-computationally-intensive) approach to understanding the uncertainty in this slope? The issue is that the uncertainty in the fitted values might not be independent, and I have no idea what form this dependency might generally take.

Even if you don't have ideas about how to rigorously sample from this uncertainty, I'd really appreciate any insight on whether ignoring the non-independence would be likely to yield a too-wide or too-narrow confidence interval for the slope.

Best Answer

You don't give an example so I'm not sure what you mean by an "interval" (how big?, etc), but you run the risk of having a biased estimate of the derivative of the smooth if your interval is long relative to the wiggliness of the estimated smooth.

It is quite easy to compute a finite difference-based estimate of the derivative of an estimated smooth using existing tools in mgcv. I'll summarise the approach below but this is cribbed from some blog posts I wrote that show how to do this and consider simultaneous intervals, not just the across-the-function intervals I'll show below.

I'll also use some functions I wrote to do this analysis that are in my gratia package.

library("mgcv")
library("gratia")
set.seed(2)
dat <- gamSim(1, n = 400, dist = "normal", scale = 2)
mod <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat, method = "REML")

The basic idea is to use the type = "lpmatrix" option of the predict() method for gam objects. This returns the linear predictor matrix for a set of locations at which we evaluate the spline, which, when multiplied by the coefficients of the model yields predicted values for the smooth. We compute two of these matrices; one for a fine grid of points over the range of the covariate for the smooth, and the second at the same locations but shifted a tiny amount. We difference these two matrices to get a linear predictor matrix for the derivative at the grid of points. The variance-covariance matrix of the model coefficients is then used to created a confidence interval. As the model is additive we can ignore the other covariates and smooths and do this for each smooth in turn.

All of this is done by the fderiv() function in gratia:

## first derivatives of all smooths...
fd <- fderiv(mod)

## ...and a selected smooth
fd2 <- fderiv(mod, term = "x1")

Now fd and fd2 contain the finite difference-estimates of the derivative of all or one smooth from the fitted model. The blog posts linked below go into a lot more detail of what is going on under the hood.

We can generate a confidence interval for the derivative using the confint method

ci <- confint(fd, type = "confidence")
head(ci)

> head(ci)
  term      lower      est    upper
1   x0 -0.8496032 4.112256 9.074116
2   x0 -0.8489453 4.112287 9.073519
3   x0 -0.8448850 4.112468 9.069821
4   x0 -0.8329612 4.112988 9.058936
5   x0 -0.8108548 4.113933 9.038721
6   x0 -0.7769721 4.115360 9.007693

This has info for all the smooths, and by default is a 95% interval. First, add on a column of values where the derivatives were evaluated:

ci <- cbind(ci, x = as.vector(fd[['eval']]))

Then we can plot:

library("ggplot2")
ggplot(ci, aes(x = x, y = est, group = term)) +
  geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.3) +
  geom_line() +
  facet_wrap( ~ term)

Giving:

enter image description here

Blog posts that contain more detail: