Solved – Model selection and comparison in GAMM using R (mgcv)

generalized-additive-modelmgcvmodel comparisonmodel selectionr

I'm fitting a GAMM with correlation structure, using a non-Gaussian family. Here's an example of my global model:
M0 <- gamm(response ~ var1*var2 + var3 + s(var4) + s(var5) + s(var6,var7), random=list(placeID= ~1), correlation= corAR1(form= ~ year | placeID), data=data, family=quasipoisson)

I'd like to do term selection on the global model to drop any nonrelevant variables, but I'm not sure how it should be done. Based on the help text of gam.selectionand hoping it applies to GAMM as well! – I've used backward stepwise selection so far. The text says "It is perfectly possible to perform backwards selection using p-values in the usual way". Thus, I've started from the smoothers (removing smoothers from any linear and/or nonsignificant variable) and then moved on to linear variables. I would be interested to hear, whether you think this method is useful in GAMM or not. Please, also tell if you have something else to suggest (e.g. GCV mentioned below).

I would've also liked to compare models that differ in their structure regarding coordinates (smoothed or not, interaction included or not). I suspect that s(latitude,longitude) might be overfitting, and I would like to check this somehow. Since the model uses PQL, I gather that AIC is not recommended (although included in lme part).

The gam.selection text mentions different scores:
"In general the most logically consistent method to use for deciding which terms to include in the model is to compare GCV/UBRE/ML scores for models with and without the term." Is this method only for comparing models that differ by a single variable? If so, do you have any suggestions on how to compare models such as:

  • M1 <- gamm(response ~ var1*var2 + s(lat,long)+ s(var5), random=list(placeID= ~1), correlation= corAR1(form= ~ year | placeID), data=data, family=quasipoisson)
  • M2 <- gamm(response ~ var1*var2 + lat + s(var5), random=list(placeID= ~1), correlation= corAR1(form= ~ year | placeID), data=data, family=quasipoisson)

I had been under the impression that my global GAMM-model M0 uses GCV (the default), but when I call gam.check(M0$gam), I only get the four plots and no GCV score or other text output. Is there a way to acquire a GCV score of a GAMM object?

Best Answer

You are not fitting a GAM but a GAMM. If you look at ?gamm you'll see that there are two options for model fitting an AMM and only one when fitting a GAMM:

  method: Which of ‘"ML"’ or ‘"REML"’ to use in the Gaussian additive
          mixed model case when ‘lme’ is called directly. Ignored in
          the generalized case (or if the model has an offset), in
          which case ‘gammPQL’ is used.

As you are fitting with a quasipoisson family, the model fitting hierarchy is:

gamm() ---> MASS::glmmPQL() ---> nlme::lme()

and the likelihood that is optimised is a penalised quasi-likelihood (PQL). As the splines are treated as part fixed and part random effects in this formulation of the GAMM, smoothness selection boils down to using PQL and I am not aware of results that can advise on how well PQL performs in that regard (unlike REML selection with regular GAMs).

If you only need simple random effects, it might be possible to work in the GAM side of things:

  • In ?magic there is an example of fitting the GAM with correlated data (but in the Gaussian case.
  • In ?bam there is an option to specify an AR(1) parameter, but again, in the Gaussian case.

You would need to use bs = "re" in an s() term for the random effect.

Related Question