Solved – Overlaying GAM and LM in R

data visualizationgeneralized-additive-modelr

I am seeking to generate a figure that allows for an immediate comparison between a linear trend and a non-linear (flexible) trend. Both of these trend lines will be adjusted for several covariates.

I am able to code these two models in R: one using lm() and one using gam(). These models are identical, except that the gam() model specifies the variable of interest as a spline, while the lm() model specifies the variable of interest as a linear continuous variable.

Can anyone offer advice as to how I might generate a figure the includes both the spline-modeled variable from the gam() function and the linear parameter estimate from the lm() function?

The two models are described as follows, where "EXPOSURE" is the variable of interest that I'd like to plot:

model.lm <- lm(OUTCOME ~ EXPOSURE + COVARIATE, data=data)

model.gam = gam(OUTCOME ~ s(EXPOSURE, k=5) + COVARIATE, data=data)

NOTE: I acknowledge that I am very unfamiliar with R (SAS is my primary statistical coding language), and apologize if this is a poor question to ask here.

Best Answer

We can't use covariates in formula in ggplot2 (if we use the tidyverse). So we have to build the models and the fit independently, merge the dataframes, then we can plot our results.

library(tidyverse)
library(mgcv)

set.seed(0)

mydata <- tibble(exposure = 1:20,
                 covariate = rnorm(20, 0, 0.1),
                 outcome = rnorm(20, 5, 2))

fit_lm <- mydata %>% 
  lm(outcome ~ exposure + covariate, data = .) %>% 
  predict(newdata = mydata, interval = "confidence") %>% 
  as_data_frame()

fit_gam <- mydata %>% 
  gam(outcome ~ s(exposure, k = 5) + covariate, data = .) %>% 
  predict(newdata = mydata, type = "link", se.fit = TRUE) %>% 
  as_data_frame() %>% 
  rename(fit_gam = fit) %>% 
  mutate(lwr_gam = fit_gam - 2 * se.fit,
         upr_gam = fit_gam + 2 * se.fit) %>% 
  select(-se.fit)

mydata %>% 
  bind_cols(fit_lm, fit_gam) %>% 
  ggplot() +
    geom_point(aes(exposure, outcome)) +
    geom_line(aes(exposure, fit), size = 1, color = "red") +
    geom_ribbon(aes(x = exposure, ymin = lwr, ymax = upr), alpha = 0.2, fill = "red") +
    geom_line(aes(exposure, fit_gam), size = 1, color = "blue") +
    geom_ribbon(aes(x = exposure, ymin = lwr_gam, ymax = upr_gam), alpha = 0.2, fill = "blue")

plot

NB : check that the confidence interval for the GAM is correct for you (see Confidence interval for GAM model).

Related Question