I am struggling with choosing the correct model for my study, and I hoped that maybe someone would be able to help me, or shine some light please 🙂
I have lots of data about vegetation and individual preferences that I am trying to analyse. I tried using a mixed model (lmer) to begin with, with the 3 different fields where I repeated the study as a random effect. I began by making my model including every single interaction, but it was too much, and R gave me this error message:
"fixed-effect model matrix is rank deficient so dropping 151 columns / coefficients.
Error: Dropping columns failed to produce full column rank design matrix"
So I dropped the interactions and just did the other factors. I did one version as a glm without the random effect, and one as an lmer with the random effect.
I then tried to compare them using anova, but I don't understand the results, I'll put the code and results below.
Please can you guys have a look and tell me what you think, that would be great, Thank you! (Sorry this is so long)
mod2 <- glm(Buffer ~ Age + Sex + Captures
+ PC1+ PC2+ Lvl1_Av + Lvl1_Med
+ Lvl1_SD+ Lvl1_Sum+ Lvl2_Av+ Lvl2_Med
+ Lvl2_SD+ Lvl2_Sum+ Lvl3_Av
+ Lvl3_Med
+ Lvl3_SD
+ Lvl3_Sum
+ Lvl4_Av
+ Lvl4_Med
+ Lvl4_SD
+ Lvl4_Sum )
mod3 <- lmer(Buffer ~ Age + Sex + Captures
+ PC1+ PC2+ Lvl1_Av + Lvl1_Med
+ Lvl1_SD+ Lvl1_Sum+ Lvl2_Av+ Lvl2_Med
+ Lvl2_SD+ Lvl2_Sum+ Lvl3_Av
+ Lvl3_Med
+ Lvl3_SD
+ Lvl3_Sum
+ Lvl4_Av
+ Lvl4_Med
+ Lvl4_SD
+ Lvl4_Sum + (1|Fence))
anova(mod2, mod3, test="Chisq")
#And this is what I got
> anova(mod2, mod3, test="Chisq")
Analysis of Deviance Table
Model: gaussian, link: identity
Response: Buffer
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 80 7541772
Age 1 335703 79 7206069 < 2.2e-16 ***
Sex 1 1225073 78 5980996 < 2.2e-16 ***
Captures 1 1365027 77 4615968 < 2.2e-16 ***
PC1 1 9632 76 4606337 0.0001964 ***
PC2 1 194968 75 4411369 < 2.2e-16 ***
Lvl1_Av 1 883 74 4410486 0.2596526
Lvl1_Med 1 24511 73 4385975 2.848e-09 ***
Lvl1_SD 1 69605 72 4316370 < 2.2e-16 ***
Lvl1_Sum 1 4229768 71 86602 < 2.2e-16 ***
Lvl2_Av 1 250 70 86352 0.5485363
Lvl2_Med 1 360 69 85992 0.4713995
Lvl2_SD 1 237 68 85755 0.5589011
Lvl2_Sum 1 24078 67 61676 3.922e-09 ***
Lvl3_Av 1 1330 66 60346 0.1664550
Lvl3_Med 1 345 65 60001 0.4810493
Lvl3_SD 1 2 64 59999 0.9524658
Lvl3_Sum 1 1395 63 58604 0.1564064
Lvl4_Av 1 304 62 58300 0.5085230
Lvl4_Med 1 3928 61 54372 0.0174052 *
Lvl4_SD 1 85 60 54286 0.7260341
Lvl4_Sum 1 13301 59 40985 1.210e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Best Answer
You are specifying the model comparison wrong.
There is no reason to use
glm
with a Gaussian family. Uselm
as it is fully equivalent but computationally superior.You need to ensure that R uses the correct method for the
anova
generic. Since this is an S3 generic and method dispatch works only on the first argument, thelmer
model must be first. Your code actually callsanova.glm
, which does not do the intended model comparison.So, in summary, with an easy example based on the
iris
dataset: