Solved – lmer model simplification


I am trying to do model simplification looking at how different factors may affect distance. So I have snails kept in several habitats and I wanted to see if that affects how closely another snail may follow that snail. So I start off with this model:

  model1 <- lmer(sqrt(dist+6)~  (1|snail)+food+stress+food:stress+

and the summary is this:

  Linear mixed model fit by REML ['lmerMod']
  Formula: sqrt(dist + 6) ~ (1 | snail) + food + stress + food:stress +  
weight + OriginalL + FollowedL

REML criterion at convergence: 561.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.2941 -0.7698 -0.3347  0.7515  1.9564 

Random effects:
 Groups   Name        Variance Std.Dev.
 snail    (Intercept) 0.000    0.000   
 Residual             2.334    1.528   
Number of obs: 148, groups:  snail, 37

Fixed effects:
                               Estimate Std. Error t value
(Intercept)                    4.960927   0.662947   7.483
foodSweetPotato               -0.219039   0.357768  -0.612
stressshelter                 -0.246649   0.355999  -0.693
weight                         0.002520   0.063259   0.040
OriginalL                      0.015549   0.013072   1.189
FollowedL                     -0.008044   0.005972  -1.347
foodSweetPotato:stressshelter -0.300143   0.503215  -0.596

Correlation of Fixed Effects:
            (Intr) fdSwtP strsss weight OrgnlL FllwdL
foodSwetPtt -0.309                                   
stressshltr -0.315  0.502                            
weight      -0.615  0.008  0.009                     
OriginalL   -0.617 -0.021  0.032  0.123              
FollowedL   -0.470  0.118  0.059  0.087 -0.004       
fdSwtPtt:st  0.230 -0.707 -0.708 -0.008 -0.024 -0.055

Should I remove the least significant factor or remove the interactions first?

And after this is it a simple anova between my first model and most simplified model?

Best Answer

A very short answer:

  • these questions aren't really specific to mixed models, they apply generally to simplification/model selection for any form of linear model or related framework.
  • in general, it doesn't make sense to worry at all about inference, or selection of, main effects when there are interactions involving those main effects in the model; this is called the principle of marginality (sorry, that Wikipedia page is a mess, but it gives you a little more information ...), so the narrow-sense answer to your question would be to always consider removing interactions first, and as a corollary to never consider removing main effects if an interaction that involves them is retained in the model.
  • stepwise model selection, while still very popular, has some major problems; you should consider whether you really want to drop terms from your model or not ... see e.g.
    • Flom, Peter L., and David L. Cassell. “Stopping Stepwise: Why Stepwise and Similar Selection Methods Are Bad, and What You Should Use.” In NorthEast SAS Users Group Inc 20th Annual Conference: 11-14th November 2007; Baltimore, Maryland, 2007.
    • Harrell, Frank Regression Modeling Strategies (Springer), or see the Stata FAQ for an abbreviated version

I'm not sure what you mean by "is it a simple anova between my first model and most simplified model"? If you want to do inference on the terms in the model, you can use a likelihood ratio test (implemented via anova() in R), or an F test, or ...

Related Question