Generalized Additive Model – Variable and Model Selection Techniques

aicgeneralized-additive-modelmodel selectionregularizationreml

I know this type of question has been asked many times before, so I apologize for re-posting about it. I bring it up again because it's been taught in one of my courses of study and I want to make sure I'm learning the correct methods. I can't share the data, but this is the part of the code that I'm confused by:

M1 <- gam(y ~ s(x, bs = "cr") + s(Year, bs ="cr") +  
        factor_A + factor_B, 
      method = "REML",
      select = TRUE,
      data = df)

M2 <- gam(y ~ s(x, bs = "cr") + s(Year, bs ="cr") + 
            factor_A * factor_B,
          method = "REML",
          select = TRUE,
          data = df)

M3 <- gam(y ~ s(x, bs = "cr", by = factor_B) + s(Year, bs ="cr") + 
            factor_A + factor_B,
          method = "REML",
          select = TRUE,
          data = df)

M4 <- gam(y ~ s(z, bs = "cr", by = factor_B) + s(Year, bs ="cr") + 
            factor_A + factor_B,
          method = "REML",
          select = TRUE,
          data = df)

M5 <- gam(y ~ s(x, bs = "cr", by = factor_B) + s(Year, bs ="cr", by = factor_B) + 
            factor_A + factor_B,
          method = "REML",
          select = TRUE,
          data = df)

M6 <- gam(y ~ s(x, bs = "cr") + s(z, bs = "cr") + s(Year, bs ="cr", by = factor_B) + 
            factor_A + factor_B,
          method = "REML",
          select = TRUE,
          data = df)

M7 <- gam(y ~ te(x, Year) + 
             factor_A + factor_B,
           method = "REML",
           select = TRUE,
           data = df)

M8 <- gam(y ~ s(x, by = fYear) + 
             fYear + factor_A + factor_B,
           method = "REML",
           select = TRUE,
           data = df)

M9 <- gam(y ~ s(Year, by = factor_B) + 
             factor_A + factor_B,
           method = "REML",
           select = TRUE,
           data = df)

AIC(M1, M2, M3, M4, M5, M6, M7, M8, M9)

y = continuous response, and x, z = continuous covariates.

GAM models specifically (not glm) fit by REML can be compared with AIC, so that seems fine. As I understand it though, models shouldn't be compared if they were fitted by REML and have different fixed effects. This seems to be the case with model M4 and M6 as they contain an extra z term. Also, select is set to TRUE in every model before being compared in AIC. I took this option to be a separate method of selection (not to be used in combination with the AIC method). Select=TRUE can only be used on a single (final) model AFTER using AIC, as a form of variable selection. Am I mistaken here? Thank you!

References:

Pedersen, E. J., Miller, D. L., Simpson, G. L., & Ross, N. (2019). Hierarchical generalized additive models in ecology: An introduction with mgcv. PeerJ, 7, e6876. https://doi.org/10.7717/peerj.6876

Best Answer

When fitting multiple models you can use AIC, but you have to appreciate what metric the two models are being assessed on. With AIC, which is an approximation to the leave-one-out cross validation error, the metric by which you are comparing models is one of their predictive ability. Using this as your metric, you'd be adding/removing variables with a confidence level of ~0.16 (instead of the "usual" 0.05, not that this usual value is a good option).

Once you've chosen your model with AIC, you then have the problem that any inference you might do is messed up by all the model selection you did. This is the point @Frank Harrell makes in his comment. If you only care about prediction then you might not be doing any post-selection inference (looking at p values, plotting smooths with the credible intervals, etc), but if you are going to do that inference, selection via AIC for other type of stepwise selection) is going to mess things up until we have a good theory for post-selection inference (which is an active area of research; though I've not seen anything for GAMs as yet)

select=TRUE can only be used on a single (final) model AFTER using AIC, as a form of variable selection. Am I mistaken here?

Yes; you are mistaken. One would use select = TRUE instead of doing model selection by AIC, largely because the consequences of doing the selection via extra shrinkage penalties can be accounted for in the model summary() output/tests, where is can't (yet) be accounted for if you have done AIC-based selection.

From the example code you show, it seems there is interest in testing if there are combinations of one or more smooth-by-smooth or smooth-by-factor interactions. Unless I was solely interested in creating a prediction tool, I would not be comparing all those models via AIC.

Furthermore, it seems like the models are exploring different hypothesis about the temporal aspects of the data set; does the effect of x vary smoothly over the years? or does the effect of time (year) differ between levels of factor_B.

None of these seem like something I'd want to compare using AIC (unless I was purely trying to find a model for prediction).

Personally, if I thought there was an interaction between the two factor variables, I'd fit that interaction and then consider the estimates of the effect size of that interaction. I wouldn't decide to exclude the interaction on the basis of a p-value (and AIC selection is using p values, just with a confidence level of ~0.16 for terms differing by 1 DF), not least because that is a very strong statement that the interaction effect is 0, which is very unlikely.

The choice of smooth terms doesn't make sense to me; perhaps it does in the context of the scientific problem you are working on? The s(z) term only comes up in one of the models for example, why is that? It might just be that your instructor is getting you to do something that isn't "best practice" but to show you what can happen when you use those "not best-practice" techniques. I.e. to make a point pedagogically. Without more context on why those candidate models were chosen, it's hard to comment further.

As to your discoveries:

  • The Workshop presentation you link to in the comments focuses too much (IMHO) on AIC as the selection metric.

  • select = TRUE can be used with AIC, but using select = TRUE kind of renders the whole model-selection-by-AIC thing redundant as the extra penalties on the smooths are doing selection for you. That doesn't mean AIC computed on the model is wrong; you could still compare it with a model that didn't have the extra shrinkage applied for example.

  • select = TRUE isn't only applicable when you have one model in mind. You might have two or a few well-selected models in mind and you can apply the extra shrinkage to all those models. But yes, it is typical that you have a full model (something that contains a well-thought-out selection of covariates/terms, not simply every variable you have in your data set.)

  • The AIC that is used in mgcv is a special AIC that is suited to the situation where we have smoothness parameter selection going on. This AIC uses a special penalty term replacing the $2k$ penalty from common-or-garden AIC. I am not aware of similar results for BIC, so I would use BIC with care with GAMs.

Related Question