Solved – Variable selection in mixed effect models (stepAIC followed by dredge followed by model averaging)

feature selectionmixed modelr

I have simple mixed model of the form:

fit <- nlme:::lme(fixed= Outcome ~ Var1+Var2+...+Varn, random=(1|cluster), data=data)

i have n=37 environmental variables, six clusters with 30 measurements each for a total of 180 measurements/rows. Outcome is relative abundance of biological species, which are modelled one by one assuming no interaction between them (a bold assumption, i guess).
I proceed to select the most important variables with the following, easy to use approach:
First, i do a pre-selection with MASS:::stepAIC() (alternatively i could just use the p-values in summary(fit), which leaves me at 15 predictor variables, then i subject this updated model to MuMIn:::dredge() for an even sparser model, and then do model averaging on the top models based on AICc:

fit <- lme(fixed: Outcome ~ Var1+Var2+...+Varn, random=(1|cluster), data=data)
bestfit <- stepAIC(fit)
dredgedfit <- dredge(bestfit)
summary(model.avg(dredgedfit, subset=...))

Does this approach make sense? (I am not an expert on mixed models in any way). Will it identify meaningful variables and will it pass a review process?

Also; there is a slight improvement of fit when i apply a spatial correlation structure (delta AIC=1). Shall this go directly into fit, or can i apply it later on the averaged model?

Best Answer

It is not clear what you are trying to do in the end with the final model averaged model. So my comments are mostly in terms of prediction and/or interpreting covariates in the model.

The outlined approach sounds like it could lead to considerable over-fitting/hypothesis tests with a level deviating from the specified one/poorly calibrated or overconfident predictions. For a start, the initial stepwise model selection is ignored in the final model averaging and so is - it seems - the uncertainty about the spatial correlation structure. One approach that would avoid that is to do model averaging across all the considered model (i.e. without first doing stepwise) including with equal correlation across site and with a more complex spatial correlation structure. Trying to ignore any of the model uncertainty ignores part of the uncertainty you should have at the end (in predictions, in coefficients etc.).

Another think to consider is that depending on what you wish to say at the end, you may still have a multiplicity issue (i.e. if you want to say something like "covariate X is significant, covariate Y is significant etc."), which you did not say anything about.

Of course you are working with not much data and a huge space of potential models. So, it may be tempting to reduce that somehow, but one of the best tools for doing that is up-front thinking rather than stepwise (or other) model selection. The documentation to the dredge function even provides the following nice quote (I did not verify the exact quote in the book):

' "Let the computer find out" is a poor strategy and usually reflects the fact that the researcher did not bother to think clearly about the problem of interest and its scientific setting (Burnham and Anderson, 2002).'

Related Solutions

Mixed Model – How to Select the Best Model with AICc and Compare R-Squared

Is your goal model parsimony or the predictive power of the model? If parsimony, then use AIC, if predictive power then $R^2$. Usually the answer is similar, but if you are comparing models with very similar $R^2$ or a number of low quality predictors the answers can be different. This is why in regular regression we tend to look at adjusted $R^2$ rather than just $R^2$, that is, because the adjusted value penalizes $R^2$ to adjust for the variance one might expect to be explained by chance if a predictor was not really effective at all. As the author says in the blog post "although I should note that [$R^2$ is] a poor tool for model selection, since it almost always favors the most complex models".

P.S. If you are interested in the fixed effects relative to the overall variance (regardless of nesting factor), then you probably want to be looking at the marginal $R^2$.

Solved – Mixed effects model output – no difference in AIC values

The first three models you've constructed differ in the ways the parameters are defined, but they have the same number of the parameters and the fits are equivalent in every way except for the numerical values of the parameters.

We can illustrate this with a plain linear model - mixed models just complicate the issue.

set.seed(101)
dd <- expand.grid(light=c("day","dusk","night"),
                  tide=c("base","Flooding","Ebbing"))
dd$y <- rnorm(nrow(dd))
## add one more row so fit isn't perfect
dd <- rbind(dd,dd[1,])
dd$y[nrow(dd)] <- rnorm(1)

Use model.matrix to see what parameters R will construct when fitting the model (you could also use names(coef(...)) on the output of lm(), or names(fixef(...)) on the output of (g)lmer).

tmpf <- function(f) {
    model.matrix(f,data=dd)
}
colnames(m1 <- tmpf(~light+tide+light:tide))
## [1] "(Intercept)"             "lightdusk"              
## [3] "lightnight"              "tideFlooding"           
## [5] "tideEbbing"              "lightdusk:tideFlooding" 
## [7] "lightnight:tideFlooding" "lightdusk:tideEbbing"   
## [9] "lightnight:tideEbbing"

If we use the * operator, we get the interaction plus the main effects; if we redundantly specify the main effects, R silently drops them.

all.equal(m1,tmpf(~light*tide))  ## TRUE
all.equal(m1,tmpf(~light+light*tide))  ## TRUE
all.equal(m1,tmpf(~light+tide+light*tide))  ## TRUE

If we use : but leave out one of the main effects we get the same number of parameters (9), but they are organized differently:

colnames(m2 <- tmpf(~light+light:tide))
## [1] "(Intercept)"             "lightdusk"              
## [3] "lightnight"              "lightday:tideFlooding"  
## [5] "lightdusk:tideFlooding"  "lightnight:tideFlooding"
## [7] "lightday:tideEbbing"     "lightdusk:tideEbbing"   
## [9] "lightnight:tideEbbing"

As I explain elsewhere, it rarely makes sense to test the model with interactions present but main effects missing; the only ways that I know of to do this are to construct the dummy variables yourself (either by hand, or by constructing the model matrix, dropping the terms you don't want, and using the remaining model matrix columns as (numeric) predictor variables.

The MuMIn package tries to do the right thing: from ?dredge,

By default, marginality constraints are respected, so “all possible combinations” include only those containing interactions with their respective main effects and all lower order terms.

library(MuMIn)    
full_model <- lm(y~light*tide,data=dd,na.action="na.fail")    
(dmods <- dredge(full_model))
## Model selection table 
##      (Int) lgh tid lgh:tid df logLik   AICc  delta weight
## 8 -0.27460   +   +       + 10 23.541 -247.1   0.00      1
## 1  0.24500                  2 -8.291   22.3 269.38      0
## 3 -0.16790       +          4 -5.948   27.9 274.98      0
## 2  0.07096   +              4 -7.821   31.6 278.72      0
## 4 -0.25820   +   +          6 -5.543   51.1 298.17      0

As you can see dredge has not tried to fit any models with the interaction but missing some main effects.

Best Answer

Related Solutions

Mixed Model – How to Select the Best Model with AICc and Compare R-Squared

Solved – Mixed effects model output – no difference in AIC values

Related Question