Solved – How to choose the type of GAM-parameters

generalized-additive-modelmachine learningmgcvrregression

I have started to work with GAM in R and I’ve acquired Simon Wood’s excellent book on the topic ("Generalized Additive Models An Introduction with R"). Based on one of his examples, I am looking at the following:

library(mgcv)
data(trees)
ct1<-gam(log(Volume) ~ Height + s(Girth), data=trees)

I have two general questions to this example:

  1. How does one decide when a variable in the model estimation should be parametric (such as Height) or when it should be smooth (such as Girth)? Does one hold an advantage over the other and is there a way to determine what is the optimal type for a variable is? If anybody has any literature about this topic, I’d be happy to know of it.

  2. Say I want to look closer at the weights of ct1: ct1$coefficients. Can I use them as the gam-procedure outputs them, or do I have to transform them before analyzing them given that I am fitting to log(Volume)? In the case of the latter, I guess I would have to use exp (ct1$coefficients)

Best Answer

I'm assuming this is better explained in the 2nd edition of Simon's book (which should be out in a couple of days) as he and his students only worked out some of the theory for this years after Simon wrote his book.

What Marra & Wood (2011) showed was that if we want to do selection on a model with smooth terms, then one very good approach is to add an extra penalty to all the smooth terms. This additional penalty works with the smoothness penalty for that term to control both the wiggliness of the term and whether a term should be in the model at all.

So, unless you have any good theory to assume either smooth or linear/parametric forms/effects for the covariates, you could approach the problem as choosing among all models (representable by the additive combination of linear combinations of the basis functions) between one with smooths of each covariate all the way back to a model containing just an intercept.

For example:

library(mgcv)
data(trees)
ct1 <- gam(log(Volume) ~ s(Height) + s(Girth), data=trees, method = "REML", select = TRUE)

> summary(ct1)

Family: gaussian 
Link function: identity 

Formula:
log(Volume) ~ s(Height) + s(Girth)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.27273    0.01492   219.3   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
            edf Ref.df      F  p-value    
s(Height) 0.967      9  3.249 3.51e-06 ***
s(Girth)  2.725      9 75.470  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.975   Deviance explained = 97.8%
-REML = -23.681  Scale est. = 0.0069012  n = 31

Looking at the output (specifically in the section Parametric coefficients), we note that both terms are highly significant. But note the effective degrees of freedom value for the smooth of Height; it is ~1. What these tests are doing is explained in Wood (2013).

This suggests to me that Height should enter the model as a linear parametric term. We can evaluate this by plotting the fitted smooth:

> plot(ct1, select = 1, shade = TRUE, scale = 0, seWithMean = TRUE)

which gives:

Fitted smooth of Height

This clearly shows that the selected form of the effect of Height is linear.

However, if you didn't know this upfront (and you didn't because otherwise you wouldn't have asked the question) you can't now refit the model to these data using only a linear term for Height. That would cause you real problems with inference down the line. The output in summary() has accounted for the fact that you did this selection. If you refit the model with a linear parametric effect of Height, the output wouldn't know this and you'd get overly optimistic p-values.

As for question 2, as already mentioned in the comments, no, don't exponentiate the coefficients from this model. Also, don't delve into fitted models as the contents of these components is not always what you might expect. Use the extractor functions instead; in this case coef().

Later in the book when Simon gets to GLMs and GAMs, you'll see him model these data via a Gamma GLM:

ct1 <- gam(Volume ~ Height + s(Girth), data=trees, method = "REML",
           family = Gamma(link = "log"))

In that model, because the fitting is being done on the scale of the linear predictor (on the log scale), the coefficients could be exponentiated to get some partial effect, but you are better off using predict(ct1, ...., type = "response") to get back fitted values/predictions on the scale of the response (in m^3).


Marra, G. & Wood, S. N. Practical variable selection for generalized additive models. Comput. Stat. Data Anal. 55, 2372–2387 (2011).

Wood, S. N. On p-values for smooth components of an extended generalized additive model. Biometrika 100, 221–228 (2013).