Solved – Comparing GLM/Lmer Models

generalized linear modellme4-nlmemixed modelr

I am struggling with choosing the correct model for my study, and I hoped that maybe someone would be able to help me, or shine some light please 🙂

I have lots of data about vegetation and individual preferences that I am trying to analyse. I tried using a mixed model (lmer) to begin with, with the 3 different fields where I repeated the study as a random effect. I began by making my model including every single interaction, but it was too much, and R gave me this error message:

"fixed-effect model matrix is rank deficient so dropping 151 columns / coefficients.
Error: Dropping columns failed to produce full column rank design matrix"

So I dropped the interactions and just did the other factors. I did one version as a glm without the random effect, and one as an lmer with the random effect.
I then tried to compare them using anova, but I don't understand the results, I'll put the code and results below.

Please can you guys have a look and tell me what you think, that would be great, Thank you! (Sorry this is so long)

mod2 <- glm(Buffer ~ Age + Sex + Captures
        + PC1+ PC2+ Lvl1_Av + Lvl1_Med
        + Lvl1_SD+ Lvl1_Sum+ Lvl2_Av+ Lvl2_Med
        + Lvl2_SD+ Lvl2_Sum+ Lvl3_Av
        + Lvl3_Med
        + Lvl3_SD
        + Lvl3_Sum
        + Lvl4_Av
        + Lvl4_Med 
        + Lvl4_SD 
        + Lvl4_Sum )

mod3 <- lmer(Buffer ~ Age + Sex + Captures
        + PC1+ PC2+ Lvl1_Av + Lvl1_Med
        + Lvl1_SD+ Lvl1_Sum+ Lvl2_Av+ Lvl2_Med
        + Lvl2_SD+ Lvl2_Sum+ Lvl3_Av
        + Lvl3_Med
        + Lvl3_SD
        + Lvl3_Sum
        + Lvl4_Av
        + Lvl4_Med 
        + Lvl4_SD 
        + Lvl4_Sum + (1|Fence))

anova(mod2, mod3, test="Chisq")

#And this is what I got

> anova(mod2, mod3, test="Chisq")
Analysis of Deviance Table

Model: gaussian, link: identity

Response: Buffer

Terms added sequentially (first to last)


         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                        80    7541772              
Age       1   335703        79    7206069 < 2.2e-16 ***
Sex       1  1225073        78    5980996 < 2.2e-16 ***
Captures  1  1365027        77    4615968 < 2.2e-16 ***
PC1       1     9632        76    4606337 0.0001964 ***
PC2       1   194968        75    4411369 < 2.2e-16 ***
Lvl1_Av   1      883        74    4410486 0.2596526    
Lvl1_Med  1    24511        73    4385975 2.848e-09 ***
Lvl1_SD   1    69605        72    4316370 < 2.2e-16 ***
Lvl1_Sum  1  4229768        71      86602 < 2.2e-16 ***
Lvl2_Av   1      250        70      86352 0.5485363    
Lvl2_Med  1      360        69      85992 0.4713995    
Lvl2_SD   1      237        68      85755 0.5589011    
Lvl2_Sum  1    24078        67      61676 3.922e-09 ***
Lvl3_Av   1     1330        66      60346 0.1664550    
Lvl3_Med  1      345        65      60001 0.4810493    
Lvl3_SD   1        2        64      59999 0.9524658    
Lvl3_Sum  1     1395        63      58604 0.1564064    
Lvl4_Av   1      304        62      58300 0.5085230    
Lvl4_Med  1     3928        61      54372 0.0174052 *  
Lvl4_SD   1       85        60      54286 0.7260341    
Lvl4_Sum  1    13301        59      40985 1.210e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Best Answer

You are specifying the model comparison wrong.

There is no reason to use glm with a Gaussian family. Use lm as it is fully equivalent but computationally superior.
You need to ensure that R uses the correct method for the anova generic. Since this is an S3 generic and method dispatch works only on the first argument, the lmer model must be first. Your code actually calls anova.glm, which does not do the intended model comparison.

So, in summary, with an easy example based on the iris dataset:

mod1 <- lm(Sepal.Length ~ Sepal.Width, data = iris)
mod2 <- lmer(Sepal.Length ~ Sepal.Width + (1 | Species), data = iris)

anova(mod2, mod1, test="Chisq")
#refitting model(s) with ML (instead of REML)
#Data: iris
#Models:
#fit1: Sepal.Length ~ Sepal.Width
#fit2: Sepal.Length ~ Sepal.Width + (1 | Species)
#     Df    AIC    BIC   logLik deviance  Chisq Chi Df Pr(>Chisq)    
#fit1  3 371.99 381.02 -182.996   365.99                             
#fit2  4 200.53 212.57  -96.265   192.53 173.46      1  < 2.2e-16 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Related Solutions

Mixed Models in R – Checking Assumptions in lmer/lme

Q1: Yes - just like any regression model.

Q2: Just like general linear models, your outcome variable does not need to be normally distributed as a univariate variable. However, LME models assume that the residuals of the model are normally distributed. So a transformation or adding weights to the model would be a way of taking care of this (and checking with diagnostic plots, of course).

Q3: plot(myModel.lme)

Q4: qqnorm(myModel.lme, ~ranef(., level=2)). This code will allow you to make QQ plots for each level of the random effects. LME models assume that not only the within-cluster residuals are normally distributed, but that each level of the random effects are as well. Vary the level from 0, 1, to 2 so that you can check the rat, task, and within-subject residuals.

EDIT: I should also add that while normality is assumed and that transformation likely helps reduce problems with non-normal errors/random effects, it's not clear that all problems are actually resolved or that bias isn't introduced. If your data requires a transformation, then be cautious about estimation of the random effects. Here's a paper addressing this.

R Software – Using the predict() Function for lmer Mixed Effects Models

It's easy to get confused by the presentation of coefficients when you call coef(fit2). Look at the summary of fit2:

> summary(fit2)
Linear mixed model fit by REML ['lmerMod']
Formula: Recall ~ (1 | Subject/Time) + Caffeine
   Data: data

REML criterion at convergence: 444.5

Scaled residuals: 
 Min       1Q   Median       3Q      Max 
-1.88657 -0.46382 -0.06054  0.31430  2.16244 

Random effects:
 Groups       Name        Variance Std.Dev.
 Time:Subject (Intercept)  558.4   23.63   
 Subject      (Intercept) 2458.0   49.58   
 Residual                  675.0   25.98   
Number of obs: 45, groups:  Time:Subject, 15; Subject, 5

Fixed effects:
Estimate Std. Error t value
(Intercept) 61.91827   25.04930   2.472
Caffeine     0.21163    0.07439   2.845

Correlation of Fixed Effects:
 (Intr)
Caffeine -0.365

There is an overall intercept of 61.92 for the model, with a caffeine coefficient of 0.212. So for caffeine = 95 you predict an average 82.06 recall.

Instead of using coef, use ranef to get the difference of each random-effect intercept from the mean intercept at the next higher level of nesting:

> ranef(fit2)
$`Time:Subject`
         (Intercept)
0:Jason    13.112130
0:Jim      33.046151
0:Ron      -3.197895
0:Tina      8.893985
0:Victor   24.392738
1:Jason    -2.068105
1:Jim      -9.260334
1:Ron      -4.428399
1:Tina      6.515667
1:Victor   17.265589
2:Jason   -18.203436
2:Jim     -19.835771
2:Ron      -3.473053
2:Tina    -17.180791
2:Victor  -25.578477

$Subject
       (Intercept)
Jason   -31.513915
Jim      17.387103
Ron     -48.856516
Tina     -7.796104
Victor   70.779432

The values for Jim at Time=0 will differ from that average value of 82.06 by the sum of both his Subject and his Time:Subject coefficients:

$$82.06+17.39+33.04=132.49$$

which I think is within rounding error of 132.46.

The intercept values returned by coef seem to represent the overall intercept plus the Subject or Time:Subject specific differences, so it's harder to work with those; if you tried to do the above calculation with the coef values you would be double-counting the overall intercept.

Best Answer

Related Solutions

Mixed Models in R – Checking Assumptions in lmer/lme

R Software – Using the predict() Function for lmer Mixed Effects Models

Related Question