Here is a quick reproducible script to plot GAMs on the response scale instead of the "smooth" scale for a single term via the 'mgcv' R package.
I made the below script for a friend who is using GAMs for his PhD. work in evolutionary biology. I use the Hubble data from the 'gamair' R package simply as a demonstration.
##### Example of GAM Plotting on Response Scale #####
### Load required packages ###
library(gamair)
library(mgcv)
### Data setup ###
data(hubble)
hubble # 24 observations
x <- hubble$x # predictor
y <- hubble$y # response
### GAM setup ###
mod <- gam(y ~ s(x)) # thin plate GAM with k = 10 degrees of freedom (by default)
plot(mod) # smooth term is on y-axis
### GAM prediction ###
pd <- data.frame(x = seq(1, 24, by = 0.1)) # fine grid of points
pr <- predict(mod, newdata = pd, type = "response", se = TRUE) # get predicted response values from GAM
### GAM plotting ###
with(hubble, plot(x, y, ylim = c(0, 2000))) # plot data
lines(pd$x, pr$fit) # plot predicted fit
lines(pd$x, pr$fit - qnorm(0.975) * pr$se.fit, lty = 2) # plot lower 95% CI endpoint
lines(pd$x, pr$fit + qnorm(0.975) * pr$se.fit, lty = 2) # plot upper 95% CI endpoint
The above script works like a charm. The difficulty arises when the GAM contains multiple terms, even though only a single term is plotted at a time.
By a GAM with multiple terms I mean something like
mod <- gam(y ~ s(x) + z).
Here 'z' is linear term (not a smooth term).
Could someone (@gavinsimpson?) provide a quick example of plotting such a GAM on the response scale?
I haven't been able to find such an example online or in Simon Wood's great book on GAMs and mgcv.
Best Answer
If the model contains
z
then the effect ofx
estimated by the model is that givenz
is in the model. Hence the fitted response is the additive sum of the two effects, and we can't talk generally about the estimated values of the response for a range of values ofx
without also stating the value ofz
.For Gaussian models, you can just add on the intercept in
plot.gam()
to shift around the smooth curve on the y-axis. See argumentshift
toplot.gam()
. This assumes that as per the examplex
andz
are unrelated in the model, and furthermore some value ofz
(in this case I think 0 as it is a linear term not subject to identifiability constraints).A more general solution is just to predict from the model at a grid of values over
x
while holdingz
constant at some representative value, say its mean or median.Here's a full example of doing this by hand:
producing
That script should be fine for any of the standard
family
options in mgcv, but you'll have to take careful note of whatpredict()
returns for some of the fancier families in mgcv.