GAMs Plotting – Plotting Generalized Additive Models on Response Scale with Multiple Smooth and Linear Terms

generalized-additive-modelmgcv

Here is a quick reproducible script to plot GAMs on the response scale instead of the "smooth" scale for a single term via the 'mgcv' R package.

I made the below script for a friend who is using GAMs for his PhD. work in evolutionary biology. I use the Hubble data from the 'gamair' R package simply as a demonstration.

##### Example of GAM Plotting on Response Scale #####

### Load required packages ###

library(gamair)
library(mgcv)

### Data setup ###

data(hubble)
hubble # 24 observations

x <- hubble$x # predictor
y <- hubble$y # response


### GAM setup ###

mod <- gam(y ~ s(x)) # thin plate GAM with k = 10 degrees of freedom (by default)

plot(mod) # smooth term is on y-axis

### GAM prediction ###

pd <- data.frame(x = seq(1, 24, by = 0.1)) # fine grid of points
pr <- predict(mod, newdata = pd, type = "response", se = TRUE) # get predicted response values from GAM


### GAM plotting ###

with(hubble, plot(x, y, ylim = c(0, 2000))) # plot data
lines(pd$x, pr$fit) # plot predicted fit
lines(pd$x, pr$fit - qnorm(0.975) * pr$se.fit, lty = 2) # plot lower 95% CI endpoint
lines(pd$x, pr$fit + qnorm(0.975) * pr$se.fit, lty = 2) # plot upper 95% CI endpoint

The above script works like a charm. The difficulty arises when the GAM contains multiple terms, even though only a single term is plotted at a time.

By a GAM with multiple terms I mean something like

mod <- gam(y ~ s(x) + z).

Here 'z' is linear term (not a smooth term).

Could someone (@gavinsimpson?) provide a quick example of plotting such a GAM on the response scale?

I haven't been able to find such an example online or in Simon Wood's great book on GAMs and mgcv.

Best Answer

If the model contains z then the effect of x estimated by the model is that given z is in the model. Hence the fitted response is the additive sum of the two effects, and we can't talk generally about the estimated values of the response for a range of values of x without also stating the value of z.

For Gaussian models, you can just add on the intercept in plot.gam() to shift around the smooth curve on the y-axis. See argument shift to plot.gam(). This assumes that as per the example x and z are unrelated in the model, and furthermore some value of z (in this case I think 0 as it is a linear term not subject to identifiability constraints).

A more general solution is just to predict from the model at a grid of values over x while holding z constant at some representative value, say its mean or median.

Here's a full example of doing this by hand:

library("mgcv")
library("ggplot2")

set.seed(1)
df <- gamSim()
m <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = df, method = "REML")

new_data <- with(df, expand.grid(x2 = seq(min(x2), max(x2), length = 200),
                                 x0 = median(x0),
                                 x1 = median(x1),
                                 x3 = median(x3)))

ilink <- family(m)$linkinv
pred <- predict(m, new_data, type = "link", se.fit = TRUE)
pred <- cbind(pred, new_data)
pred <- transform(pred, lwr_ci = ilink(fit - (2 * se.fit)),
                        upr_ci = ilink(fit + (2 * se.fit)),
                        fitted = ilink(fit))

ggplot(pred, aes(x = x2, y = fitted)) +
  geom_ribbon(aes(ymin = lwr_ci, ymax = upr_ci), alpha = 0.2) +
  geom_line()

producing

enter image description here

That script should be fine for any of the standard family options in mgcv, but you'll have to take careful note of what predict() returns for some of the fancier families in mgcv.