Random Effects Model – Why is There a Huge AIC Difference Between GAM and GAMM Models?

aicgeneralized-additive-modelrandom-effects-model

I'm working with spatial fisheries catch data and environmental variables, and I'm correlating the abundance in the catch to some oceanographic parameters. I'm using a Generalised Additive Model (GAM) and a Generalised Additive Mixed Model (GAMM) with one and two random effects (mgcv package in R), in particular:

m1 <- gam(kg ~ s(var1, k=10) + s(var2, k=10) + ... + offset(time), 
          family = Gamma(link=log), method=REML)

m2 <- gamm(kg ~ s(var1, k=10) + s(var2, k=10) + ... + offset(time), 
           family = Gamma(link=log), method=REML, random= list(vessel = ~ 1))

m3 <- gamm(kg ~ s(var1, k=10) + s(var2, k=10) + ... + offset(time), 
          family = Gamma(link=log), method=REML, 
          random= list(vessel = ~ 1, dayOfTheYear = ~ 1))

I get quite reasonable results in terms of residuals and covariates partial effects, but I get a huge difference in AIC between m1 and m2/m3, i.e. AIC(m1) = 58998, AIC(m2$lme) = 9410, and AIC(m3$lme) = 8751.

From what I understood, I should be able to compare GAM and GAMM from mgcv using AIC… Am I doing something wrong?

Thanks!

Best Answer

You shouldn't compare the AICs between objects fitted with different software. gam() is fitted via some fancy code fu in the mgcv package, whereas your gamm() fit is actually fitted via fancy code in the MASS (glmmPQL()) and then nlme (lme()) packages. It would be common for different constants to end up in the log likelihood.

When I read your question, I assumed you wanted to compare a GAM with no random effect to the same model with a random effect(s). To do that, fit the non-random effect model with gamm() too. For example, using @ACD's example data (from a now-deleted Answer):

set.seed(13)
x = rnorm(1000)
eff = as.factor(round(rnorm(1000)+5))
y = exp(x)*runif(1000)+as.numeric(eff)
plot(x,y)

gam_example = gamm(y ~ s(x), method="REML", family=Gamma(link="log"))
gamm_example = gamm(y ~ s(x), method="REML", random= list(eff = ~ 1), 
                    family = Gamma(link="log"))

Which gives two lines give:

> AIC(gam_example$lme)
    [1] -2.136317
    > AIC(gamm_example$lme)
[1] -1286.448

Hence there is strong support for the inclusion of the random effect and we can compare the AICs because they have been fitted via the same algorithm and code.

Technically, what @ACD shows (in a now-deleted Answer) is also incorrect. Whilst the two models are comparable in terms of both including a random effect for the eff variable, the AICs are not comparable because very different algorithms are used in the fitting:

## after running the code from @ACD's answer
> AIC(gam_example)
[1] 1721.197
> AIC(gamm_example$lme)
[1] -1286.448

The difference in AIC is meaningless; in this case, the gamm() is fitted using a penalised quasi likelihood criterion, where as the gam() is fitted using a standard REML (in this case) criterion.

Related Solutions

Generalized Additive Models – Predicting with Random Effects in mgcv gam

From version 1.8.8 of mgcv predict.gam has gained an exclude argument which allows for the zeroing out of terms in the model, including random effects, when predicting, without the dummy trick that was suggested previously.

predict.gam and predict.bam now accept an 'exclude' argument allowing terms (e.g. random effects) to be zeroed for prediction. For efficiency, smooth terms not in terms or in exclude are no longer evaluated, and are instead set to zero or not returned. See ?predict.gam.

library("mgcv")
require("nlme")
dum <- rep(1,18)
b1 <- gam(travel ~ s(Rail, bs="re", by=dum), data=Rail, method="REML")
b2 <- gam(travel ~ s(Rail, bs="re"), data=Rail, method="REML")

head(predict(b1, newdata = cbind(Rail, dum = dum)))    # ranefs on
head(predict(b1, newdata = cbind(Rail, dum = 0)))      # ranefs off
head(predict(b2, newdata = Rail, exclude = "s(Rail)")) # ranefs off, no dummy

> head(predict(b1, newdata = cbind(Rail, dum = dum)))    # ranefs on
       1        2        3        4        5        6 
54.10852 54.10852 54.10852 31.96909 31.96909 31.96909  
> head(predict(b1, newdata = cbind(Rail, dum = 0)))      # ranefs off
   1    2    3    4    5    6 
66.5 66.5 66.5 66.5 66.5 66.5
> head(predict(b2, newdata = Rail, exclude = "s(Rail)")) # ranefs off, no dummy
   1    2    3    4    5    6 
66.5 66.5 66.5 66.5 66.5 66.5

Older approach

Simon Wood has used the following simple example to check this is working:

library("mgcv")
require("nlme")
dum <- rep(1,18)
b <- gam(travel ~ s(Rail, bs="re", by=dum), data=Rail, method="REML")
predict(b, newdata=data.frame(Rail="1", dum=0)) ## r.e. "turned off"
predict(b, newdata=data.frame(Rail="1", dum=1)) ## prediction with r.e

Which works for me. Likewise:

dum <- rep(1, NROW(na.omit(Orthodont)))
m <- gam(distance ~ s(age, bs = "re", by = dum) + Sex, data = Orthodont)
predict(m, data.frame(age = 8, Sex = "Female", dum = 1))
predict(m, data.frame(age = 8, Sex = "Female", dum = 0))

also works.

So I would check the data you are supplying in newdata is what you think it is as the problem may not be with VesselID — the error is coming from the function that would have been called by the predict() calls in the examples above, and Rail is a factor in the first example.

Solved – Model selection and comparison in GAMM using R (mgcv)

You are not fitting a GAM but a GAMM. If you look at ?gamm you'll see that there are two options for model fitting an AMM and only one when fitting a GAMM:

  method: Which of ‘"ML"’ or ‘"REML"’ to use in the Gaussian additive
          mixed model case when ‘lme’ is called directly. Ignored in
          the generalized case (or if the model has an offset), in
          which case ‘gammPQL’ is used.

As you are fitting with a quasipoisson family, the model fitting hierarchy is:

gamm() ---> MASS::glmmPQL() ---> nlme::lme()

and the likelihood that is optimised is a penalised quasi-likelihood (PQL). As the splines are treated as part fixed and part random effects in this formulation of the GAMM, smoothness selection boils down to using PQL and I am not aware of results that can advise on how well PQL performs in that regard (unlike REML selection with regular GAMs).

If you only need simple random effects, it might be possible to work in the GAM side of things:

In ?magic there is an example of fitting the GAM with correlated data (but in the Gaussian case.
In ?bam there is an option to specify an AR(1) parameter, but again, in the Gaussian case.

You would need to use bs = "re" in an s() term for the random effect.

Best Answer

Related Solutions

Generalized Additive Models – Predicting with Random Effects in mgcv gam

Older approach

Solved – Model selection and comparison in GAMM using R (mgcv)

Related Question