As far as I can tell, you can compare the likelihoods of glmer()
and glm()
models, at least for family=binomial
(haven't tested this for other families). If the variance components are estimated to be zero, then the likelihood should be identical and that is clearly the case. Here is an example to illustrate this:
dat <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L,
6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L, 9L,
9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L,
12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L,
14L, 14L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 17L,
17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L,
19L, 20L, 20L, 20L, 20L, 20L), xi = c(0, 0, 0, 0, 0, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, 0.8, 0.8, 0.8, 0.8, 0.8, -0.9,
-0.9, -0.9, -0.9, -0.9, 0.7, 0.7, 0.7, 0.7, 0.7, 0.1, 0.1, 0.1,
0.1, 0.1, -1.7, -1.7, -1.7, -1.7, -1.7, 0.3, 0.3, 0.3, 0.3, 0.3,
-2.8, -2.8, -2.8, -2.8, -2.8, 2.7, 2.7, 2.7, 2.7, 2.7, -0.1,
-0.1, -0.1, -0.1, -0.1, -0.2, -0.2, -0.2, -0.2, -0.2, 2, 2, 2,
2, 2, -0.6, -0.6, -0.6, -0.6, -0.6, 1.1, 1.1, 1.1, 1.1, 1.1,
0.2, 0.2, 0.2, 0.2, 0.2, -0.4, -0.4, -0.4, -0.4, -0.4, 2, 2,
2, 2, 2, -1.1, -1.1, -1.1, -1.1, -1.1), xij = c(1.1, 1.1, 0.2,
0.9, 0.4, -2.1, -0.4, -0.7, 0, 0.8, -0.4, 0.2, -1, 0, -1.2, 1.1,
1.9, 0.9, -1.4, -0.8, -0.3, -0.7, 0.7, -1.2, 1.1, -1.5, 0.3,
-1.7, -2, 0.2, 2, -0.5, -1.2, -0.2, -2.3, -0.6, -0.6, -1.6, -0.4,
-1.5, -0.5, 0.8, 0.1, -0.3, -0.7, 0.7, 0.3, -0.4, 0.4, 0.5, -0.8,
0.6, 0.3, 0.6, 0.2, -0.8, 0, -2.3, 0.5, 0, 0.9, 0.6, 2.2, 0.6,
-0.3, 0.3, 0.5, -2.2, 2, -0.6, -0.7, -0.3, -0.7, 1.7, -0.7, -0.3,
0.6, -0.9, -1.9, -0.5, 1.6, -0.5, 0.4, 1.1, 0.5, -1.8, 1.2, 1.7,
-1.1, 0.2, -0.6, -1.1, 2.1, 0.4, 0.9, 0.5, -2, 1.6, 0.1, 0.7),
yi = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L)), .Names = c("id",
"xi", "xij", "yi"), row.names = c(NA, -100L), class = "data.frame")
library(lme4)
res0 <- glm(yi ~ xi + xij, data=dat, family=binomial)
summary(res0)
res1 <- glmer(yi ~ xi + xij + (1 | id), data=dat, family=binomial)
summary(res1)
logLik(res0)
logLik(res1)
anova(res1, res0)
The last three lines yield:
> logLik(res0)
'log Lik.' -29.96427 (df=3)
> logLik(res1)
'log Lik.' -29.96427 (df=4)
>
> anova(res1, res0)
Data: dat
Models:
res0: yi ~ xi + xij
res1: yi ~ xi + xij + (1 | id)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
res0 3 65.929 73.744 -29.964 59.929
res1 4 67.929 78.349 -29.964 59.929 0 1 1
So, the (log)-likelihoods are identical, since the id
level variance component is estimated to be zero. The AIC value of the mixed-effects model is therefore 2 points larger, as expected (since the model has one more parameter).
One thing to note though: The default for glmer()
is nAGQ=1
, which means that the Laplace approximation is used. Let's use "proper" adaptive quadrature:
res1 <- glmer(yi ~ xi + xij + (1 | id), data=dat, family=binomial, nAGQ=7)
logLik(res0)
logLik(res1)
anova(res1, res0)
This yields:
> logLik(res0)
'log Lik.' -29.96427 (df=3)
> logLik(res1)
'log Lik.' -29.96427 (df=4)
> anova(res1, res0)
Error in anova.merMod(res1, res0) :
GLMMs with nAGQ>1 have log-likelihoods incommensurate with glm() objects
The variance component is still estimated to be zero and the (log)-likelihoods are identical. But anova()
spits out an error that indicates that these models should not not be compared against each other.
Best Answer
According to an informal document by Burnham, who I regard as one of the leading experts on the AIC, the notion that models need to be nested to use the AIC for model comparison is a myth. Here is the pdf. See item number 2.
While we're on the topic, I might suggest using the AICc instead of the AIC, as Burnham and Anderson (2004) recommend it as a better default model selection strategy due to its bias correction for finite samples.