Solved – Comparing GLM/Lmer Models

generalized linear modellme4-nlmemixed modelr

I am struggling with choosing the correct model for my study, and I hoped that maybe someone would be able to help me, or shine some light please 🙂

I have lots of data about vegetation and individual preferences that I am trying to analyse. I tried using a mixed model (lmer) to begin with, with the 3 different fields where I repeated the study as a random effect. I began by making my model including every single interaction, but it was too much, and R gave me this error message:

"fixed-effect model matrix is rank deficient so dropping 151 columns / coefficients.
Error: Dropping columns failed to produce full column rank design matrix"

So I dropped the interactions and just did the other factors. I did one version as a glm without the random effect, and one as an lmer with the random effect.
I then tried to compare them using anova, but I don't understand the results, I'll put the code and results below.

Please can you guys have a look and tell me what you think, that would be great, Thank you! (Sorry this is so long)

mod2 <- glm(Buffer ~ Age + Sex + Captures
        + PC1+ PC2+ Lvl1_Av + Lvl1_Med
        + Lvl1_SD+ Lvl1_Sum+ Lvl2_Av+ Lvl2_Med
        + Lvl2_SD+ Lvl2_Sum+ Lvl3_Av
        + Lvl3_Med
        + Lvl3_SD
        + Lvl3_Sum
        + Lvl4_Av
        + Lvl4_Med 
        + Lvl4_SD 
        + Lvl4_Sum )

mod3 <- lmer(Buffer ~ Age + Sex + Captures
        + PC1+ PC2+ Lvl1_Av + Lvl1_Med
        + Lvl1_SD+ Lvl1_Sum+ Lvl2_Av+ Lvl2_Med
        + Lvl2_SD+ Lvl2_Sum+ Lvl3_Av
        + Lvl3_Med
        + Lvl3_SD
        + Lvl3_Sum
        + Lvl4_Av
        + Lvl4_Med 
        + Lvl4_SD 
        + Lvl4_Sum + (1|Fence))

anova(mod2, mod3, test="Chisq")

#And this is what I got

> anova(mod2, mod3, test="Chisq")
Analysis of Deviance Table

Model: gaussian, link: identity

Response: Buffer

Terms added sequentially (first to last)


         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                        80    7541772              
Age       1   335703        79    7206069 < 2.2e-16 ***
Sex       1  1225073        78    5980996 < 2.2e-16 ***
Captures  1  1365027        77    4615968 < 2.2e-16 ***
PC1       1     9632        76    4606337 0.0001964 ***
PC2       1   194968        75    4411369 < 2.2e-16 ***
Lvl1_Av   1      883        74    4410486 0.2596526    
Lvl1_Med  1    24511        73    4385975 2.848e-09 ***
Lvl1_SD   1    69605        72    4316370 < 2.2e-16 ***
Lvl1_Sum  1  4229768        71      86602 < 2.2e-16 ***
Lvl2_Av   1      250        70      86352 0.5485363    
Lvl2_Med  1      360        69      85992 0.4713995    
Lvl2_SD   1      237        68      85755 0.5589011    
Lvl2_Sum  1    24078        67      61676 3.922e-09 ***
Lvl3_Av   1     1330        66      60346 0.1664550    
Lvl3_Med  1      345        65      60001 0.4810493    
Lvl3_SD   1        2        64      59999 0.9524658    
Lvl3_Sum  1     1395        63      58604 0.1564064    
Lvl4_Av   1      304        62      58300 0.5085230    
Lvl4_Med  1     3928        61      54372 0.0174052 *  
Lvl4_SD   1       85        60      54286 0.7260341    
Lvl4_Sum  1    13301        59      40985 1.210e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Best Answer

You are specifying the model comparison wrong.

  1. There is no reason to use glm with a Gaussian family. Use lm as it is fully equivalent but computationally superior.

  2. You need to ensure that R uses the correct method for the anova generic. Since this is an S3 generic and method dispatch works only on the first argument, the lmer model must be first. Your code actually calls anova.glm, which does not do the intended model comparison.

So, in summary, with an easy example based on the iris dataset:

mod1 <- lm(Sepal.Length ~ Sepal.Width, data = iris)
mod2 <- lmer(Sepal.Length ~ Sepal.Width + (1 | Species), data = iris)

anova(mod2, mod1, test="Chisq")
#refitting model(s) with ML (instead of REML)
#Data: iris
#Models:
#fit1: Sepal.Length ~ Sepal.Width
#fit2: Sepal.Length ~ Sepal.Width + (1 | Species)
#     Df    AIC    BIC   logLik deviance  Chisq Chi Df Pr(>Chisq)    
#fit1  3 371.99 381.02 -182.996   365.99                             
#fit2  4 200.53 212.57  -96.265   192.53 173.46      1  < 2.2e-16 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Related Question