Solved – get equal AIC, BIC and log likelihood for different models in LME framework

aicbicmaximum likelihoodmixed modelr

I have two LME models with the same interaction, one containing both main effects and one containing only one main effect, say :

$$ H\_CE = Season + Crownlevel + Season:Crownlevel , random = 1|CollectorID $$
and
$$ H\_CE = Season + Season:Crownlevel , random = 1|CollectorID $$

There are 4 levels of each, and every combination of Season, Crownlevel and CollectorID
The AIC, BIC and log likelihood of both models are completely equal. Given the formula for AIC being

$$ \mathit{AIC} = 2k – 2\ln(L)\ $$

one would expect this to be different, even if the likelihoods are exactly the same. In the end, they have a different number of parameters. Or so I thought…

Trying this toy example in R :

library(nlme)

Season <- rep(as.factor(rep(letters[1:4],each=4)),4)
Crownlevel <-rep(as.factor(rep(letters[11:14],4)),4)
CollectorID <-rep(letters[20:23],each=16)
X <-  model.matrix(~Season+Crownlevel+Season:Crownlevel)
B <- c(1,1,-2,2,0.3,0.4,0.4,2,3,1,-2,-3,-4,2,1,2)
H_CE <- X %*% B + rnorm(16*4)
KBM <- data.frame(Season,Crownlevel,H_CE,CollectorID)

model1 <- lme(H_CE~Season+Crownlevel+Season:Crownlevel,data=KBM,
       method="ML",random=~1|CollectorID)
model1e <- lme(H_CE~Season+Season:Crownlevel,data=KBM,
       method="ML",random=~1|CollectorID)

I get :

anova(model1,model1e)
        Model df      AIC      BIC    logLik
model1      1 18 174.1834 213.0433 -69.09168
model1e     2 18 174.1834 213.0433 -69.09168

What am I missing here? Why are the numbers completely equal? It has to do something with the model specification, but I can't really see what.

The model specification in itself is faulty, I know that. But I can't explain what makes it return a different set of parameters, but exactly the same residuals, likelihood and degrees of freedom :

> all.equal(residuals(model1),residuals(model1e))
[1] TRUE

As fabians rightfully pointed out, both models are perfectly equivalent. Yet, I fail to see why in the AIC calculation the same value for the number of parameters k is used.

The k in AIC uses the df, which explains everything.

Best Answer

The models are exactly equivalent. In both models you effectively specify one parameter for each combination of levels of Season and Crownlevel - the only difference is the parameterization:

In the first model, you fit main effects for Season and Crownlevel and an interaction effect to capture the combination-specific deviations from the main effects.

In the second model, you specify only the main effect of season, and the interaction effect then captures the deviations for each crownlevel within a season.

H_CE~Season:Crownlevel

would also yield an equivalent model, with one parameter for each combination of season and crownlevel (minus one that is non-identifiable because of the intercept, i.e. constitutes the reference category).

BTW: I don't think your model specification is faulty, which specification is better depends on the inference you want to do with your model.

Related Solutions

Solved – Model selection: testing the need for random-effects terms in longitudinal data

The likelihood ratio test is slightly incorrect (in general, conservative) for testing the significance of a random effect, because the null value ($\sigma^2=0$) is at the boundary of the feasible space, but in this case there is overwhelmingly strong evidence against the null hypothesis. The model with random effects of individual is 15713-6772=8941 log-likelihood units better; twice the log-likelihood value is $\chi^2$ distributed, so the direct p-value calculation would give you ...

pchisq(2*8941,df=1,lower.tail=FALSE,log.p=TRUE)/log(10)
## -3885.251

... a p-value of approximately $10^{-3885}$.

You should really consider a random-slope model (random = ~time|id) as well.

Update: relative to the random-intercept model, the random-slopes model is again much better. The improvement is now 935 log-likelihood units, which doing the equivalent calculation as above corresponds to a rejection of the null hypothesis (among-individual variation in slope is equal to zero) with a p-value of "only" $10^{-408}$.

Solved – How to mixed-effect and fixed-effect generalised linear models be compared using BIC

As far as I can tell, you can compare the likelihoods of glmer() and glm() models, at least for family=binomial (haven't tested this for other families). If the variance components are estimated to be zero, then the likelihood should be identical and that is clearly the case. Here is an example to illustrate this:

dat <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 
6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 
9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 
12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 
14L, 14L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 17L, 
17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 
19L, 20L, 20L, 20L, 20L, 20L), xi = c(0, 0, 0, 0, 0, -1, -1, 
-1, -1, -1, -1, -1, -1, -1, -1, 0.8, 0.8, 0.8, 0.8, 0.8, -0.9, 
-0.9, -0.9, -0.9, -0.9, 0.7, 0.7, 0.7, 0.7, 0.7, 0.1, 0.1, 0.1, 
0.1, 0.1, -1.7, -1.7, -1.7, -1.7, -1.7, 0.3, 0.3, 0.3, 0.3, 0.3, 
-2.8, -2.8, -2.8, -2.8, -2.8, 2.7, 2.7, 2.7, 2.7, 2.7, -0.1, 
-0.1, -0.1, -0.1, -0.1, -0.2, -0.2, -0.2, -0.2, -0.2, 2, 2, 2, 
2, 2, -0.6, -0.6, -0.6, -0.6, -0.6, 1.1, 1.1, 1.1, 1.1, 1.1, 
0.2, 0.2, 0.2, 0.2, 0.2, -0.4, -0.4, -0.4, -0.4, -0.4, 2, 2, 
2, 2, 2, -1.1, -1.1, -1.1, -1.1, -1.1), xij = c(1.1, 1.1, 0.2, 
0.9, 0.4, -2.1, -0.4, -0.7, 0, 0.8, -0.4, 0.2, -1, 0, -1.2, 1.1, 
1.9, 0.9, -1.4, -0.8, -0.3, -0.7, 0.7, -1.2, 1.1, -1.5, 0.3, 
-1.7, -2, 0.2, 2, -0.5, -1.2, -0.2, -2.3, -0.6, -0.6, -1.6, -0.4, 
-1.5, -0.5, 0.8, 0.1, -0.3, -0.7, 0.7, 0.3, -0.4, 0.4, 0.5, -0.8, 
0.6, 0.3, 0.6, 0.2, -0.8, 0, -2.3, 0.5, 0, 0.9, 0.6, 2.2, 0.6, 
-0.3, 0.3, 0.5, -2.2, 2, -0.6, -0.7, -0.3, -0.7, 1.7, -0.7, -0.3, 
0.6, -0.9, -1.9, -0.5, 1.6, -0.5, 0.4, 1.1, 0.5, -1.8, 1.2, 1.7, 
-1.1, 0.2, -0.6, -1.1, 2.1, 0.4, 0.9, 0.5, -2, 1.6, 0.1, 0.7), 
    yi = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
    0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L)), .Names = c("id", 
"xi", "xij", "yi"), row.names = c(NA, -100L), class = "data.frame")

library(lme4)

res0 <- glm(yi ~ xi + xij, data=dat, family=binomial)
summary(res0)

res1 <- glmer(yi ~ xi + xij + (1 | id), data=dat, family=binomial)
summary(res1)

logLik(res0)
logLik(res1)
anova(res1, res0)

The last three lines yield:

> logLik(res0)
'log Lik.' -29.96427 (df=3)
> logLik(res1)
'log Lik.' -29.96427 (df=4)
> 
> anova(res1, res0)
Data: dat
Models:
res0: yi ~ xi + xij
res1: yi ~ xi + xij + (1 | id)
     Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)
res0  3 65.929 73.744 -29.964   59.929                        
res1  4 67.929 78.349 -29.964   59.929     0      1          1

So, the (log)-likelihoods are identical, since the id level variance component is estimated to be zero. The AIC value of the mixed-effects model is therefore 2 points larger, as expected (since the model has one more parameter).

One thing to note though: The default for glmer() is nAGQ=1, which means that the Laplace approximation is used. Let's use "proper" adaptive quadrature:

res1 <- glmer(yi ~ xi + xij + (1 | id), data=dat, family=binomial, nAGQ=7)
logLik(res0)
logLik(res1)
anova(res1, res0)

This yields:

>     logLik(res0)
'log Lik.' -29.96427 (df=3)
>     logLik(res1)
'log Lik.' -29.96427 (df=4)
>     anova(res1, res0)
Error in anova.merMod(res1, res0) : 
  GLMMs with nAGQ>1 have log-likelihoods incommensurate with glm() objects

The variance component is still estimated to be zero and the (log)-likelihoods are identical. But anova() spits out an error that indicates that these models should not not be compared against each other.

Best Answer

Related Solutions

Solved – Model selection: testing the need for random-effects terms in longitudinal data

Solved – How to mixed-effect and fixed-effect generalised linear models be compared using BIC

Related Question