Solved – Creating marginal plots when using mgcv (GAM) package in R

generalized-additive-modelmgcvr

Before training the GAM model I log-transformed several variables. The default plotting method plot.gam in mgcv can provide nice visualization of the marginal effect of a variable but seems not be able to transform the variable back to the original scale. Is there any easy way to do this?

If not, how can we make the marginal plots by hand? I seem to have figured out how to do this with one predictor. However, in the multiple predictor case, if we scan the variable of interest, what fixed values should we use for the rest of variables? Shall we take the mean for all other variables? What if the rest of variables have some categorical variables where the mean is not clear?

Best Answer

First, make sure that transforming your predictor (or response) variables is actually a reasonable thing to do (see, e.g., here).

Second, you'll have to do it manually if you mean back-transforming a predictor variable to an original scale. If you're talking about the response variable, read this again, and if you still decided to transform the response variable see the trans and possibly shift arguments in ?plot.gam.

But let's assume you're referring to back-transforming a predictor variable. For a marginal plot, setting the other continuous variables at their mean is a popular choice, but that need not be the case. If a predictor variable was highly skewed, for example, you could easily use the median as an option. Or a specific value relevant to the context. You'll have to think carefully about what makes the most sense in your application. Note also that using the mean of a log-transformed variable in your marginal plots puts it at the geometric mean on the original scale.

The same thought process applies to categorical variables --- you could produce the marginal plot for the reference class (as I illustrate below), another class that may be more relevant, or even all classes.

Below we use mgcv's simulation capabilities to generate a data set with 3 continuous predictors (x1, x2, and x3) and a categorical predictor (x0). For simplicity, imagine that x1 and x2 have already been log-transformed, so we're interested in returning them to their original scale in our marginal plots. Since we (pretended to) log-transform x1 and x2, it's a simple matter of exponentiating the values prior to plotting.

library(mgcv)
set.seed(123)

# Assume normal response
# Assume x1 and x2 have been log-transformed
xformed_data <- gamSim(5) 
mod <- gam(y ~ x0 + s(x1) + s(x2) + s(x3), data = xformed_data) 
plot(mod, pages = 1) 

# Marginal plots (transformed and original scale) for x1
# Generate new data set with categorical variable at reference class (i.e., x0 = "1")
# This marginal plot will illustrate the marginal relationship between x1 and y with
# x0 at the refence level, x2 at its geometric mean, and x3 at its mean

x1dat <- data.frame(x0 = factor("1"), 
                    x1 = seq(min(xformed_data$x1), max(xformed_data$x1), length.out = 100),
                    x2 = mean(xformed_data$x2), 
                    x3 = mean(xformed_data$x3))

x1fit <- predict(mod, newdata = x1dat, type = "response", se = TRUE) 

# Marginal plot for x1 on transformed scale; will match output from plot(mod)
# although confidence intervals will not match because uncertainty from other
# smooths and intercept now included
plot(y ~ x1, xformed_data) 
lines(x1dat$x1, x1fit$fit) 
lines(x1dat$x1, x1fit$fit +2 * x1fit$se.fit, lty=2, col = "red") 
lines(x1dat$x1, x1fit$fit -2 * x1fit$se.fit, lty=2, col = "red") 

# Marginal plot for x1 on original scale; will match output from plot(mod)
plot(y ~ exp(x1), xformed_data) 
lines(exp(x1dat$x1), x1fit$fit) 
lines(exp(x1dat$x1), x1fit$fit +2 * x1fit$se.fit, lty=2, col = "red") 
lines(exp(x1dat$x1), x1fit$fit -2 * x1fit$se.fit, lty=2, col = "red") 

# Marginal plots (transformed and original scale) for x2
# Generate new data set with categorical variable at reference class (i.e., x0 = "1")
# This marginal plot will illustrate the marginal relationship between x2 and y with
# x0 at the refence level, x1 at its geometric mean, and x3 at its mean

x2dat <- data.frame(x0 = factor("1"), 
                    x1 = mean(xformed_data$x1),
                    x2 = seq(min(xformed_data$x2), max(xformed_data$x2), length.out = 100), 
                    x3 = mean(xformed_data$x3))

x2fit <- predict(mod, newdata = x2dat, type = "response", se = TRUE) 

# Marginal plot for x2 on transformed scale; will match output from plot(mod)
# although confidence intervals will not match because uncertainty from other
# smooths and intercept now included
plot(y ~ x2, xformed_data) 
lines(x2dat$x2, x2fit$fit) 
lines(x2dat$x2, x2fit$fit +2 * x2fit$se.fit, lty=2, col = "red") 
lines(x2dat$x2, x2fit$fit -2 * x2fit$se.fit, lty=2, col = "red") 

# Marginal plot for x2 on original scale; will match output from plot(mod)
plot(y ~ exp(x2), xformed_data) 
lines(exp(x2dat$x2), x2fit$fit) 
lines(exp(x2dat$x2), x2fit$fit +2 * x2fit$se.fit, lty=2, col = "red") 
lines(exp(x2dat$x2), x2fit$fit -2 * x2fit$se.fit, lty=2, col = "red")