Solved – Using AIC to select between models that use nested and non-nested variables

aicmixed modelnested dataspss

I'm using SPSS to try and find a mixed model that adequate explains the data that I have.
Two of the explanatory variables are closely related ('Sample group' and 'individual'), as an individual is only
ever part of one sample group, so I've been nesting them if they are in the same model.

I've been using the models AIC score to rank the models in order of explanatory power.
Some of the models use the nested variables, and some of the models only use either 'Sample group' or 'individual'.

My question is:
Is it valid to use the AIC to compare between models that use nested variables and those that don't?

To clarify by nested variables, I mean that some of the potential variables used in a model are:
1) sample site(individual)
2) sample site
3) individual

Best Answer

According to an informal document by Burnham, who I regard as one of the leading experts on the AIC, the notion that models need to be nested to use the AIC for model comparison is a myth. Here is the pdf. See item number 2.

While we're on the topic, I might suggest using the AICc instead of the AIC, as Burnham and Anderson (2004) recommend it as a better default model selection strategy due to its bias correction for finite samples.

Related Solutions

Mixed Model – How to Compare Mixed-Effect and Fixed-Effect Generalised Linear Models Using BIC

As far as I can tell, you can compare the likelihoods of glmer() and glm() models, at least for family=binomial (haven't tested this for other families). If the variance components are estimated to be zero, then the likelihood should be identical and that is clearly the case. Here is an example to illustrate this:

dat <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 
6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 
9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 
12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 
14L, 14L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 17L, 
17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 
19L, 20L, 20L, 20L, 20L, 20L), xi = c(0, 0, 0, 0, 0, -1, -1, 
-1, -1, -1, -1, -1, -1, -1, -1, 0.8, 0.8, 0.8, 0.8, 0.8, -0.9, 
-0.9, -0.9, -0.9, -0.9, 0.7, 0.7, 0.7, 0.7, 0.7, 0.1, 0.1, 0.1, 
0.1, 0.1, -1.7, -1.7, -1.7, -1.7, -1.7, 0.3, 0.3, 0.3, 0.3, 0.3, 
-2.8, -2.8, -2.8, -2.8, -2.8, 2.7, 2.7, 2.7, 2.7, 2.7, -0.1, 
-0.1, -0.1, -0.1, -0.1, -0.2, -0.2, -0.2, -0.2, -0.2, 2, 2, 2, 
2, 2, -0.6, -0.6, -0.6, -0.6, -0.6, 1.1, 1.1, 1.1, 1.1, 1.1, 
0.2, 0.2, 0.2, 0.2, 0.2, -0.4, -0.4, -0.4, -0.4, -0.4, 2, 2, 
2, 2, 2, -1.1, -1.1, -1.1, -1.1, -1.1), xij = c(1.1, 1.1, 0.2, 
0.9, 0.4, -2.1, -0.4, -0.7, 0, 0.8, -0.4, 0.2, -1, 0, -1.2, 1.1, 
1.9, 0.9, -1.4, -0.8, -0.3, -0.7, 0.7, -1.2, 1.1, -1.5, 0.3, 
-1.7, -2, 0.2, 2, -0.5, -1.2, -0.2, -2.3, -0.6, -0.6, -1.6, -0.4, 
-1.5, -0.5, 0.8, 0.1, -0.3, -0.7, 0.7, 0.3, -0.4, 0.4, 0.5, -0.8, 
0.6, 0.3, 0.6, 0.2, -0.8, 0, -2.3, 0.5, 0, 0.9, 0.6, 2.2, 0.6, 
-0.3, 0.3, 0.5, -2.2, 2, -0.6, -0.7, -0.3, -0.7, 1.7, -0.7, -0.3, 
0.6, -0.9, -1.9, -0.5, 1.6, -0.5, 0.4, 1.1, 0.5, -1.8, 1.2, 1.7, 
-1.1, 0.2, -0.6, -1.1, 2.1, 0.4, 0.9, 0.5, -2, 1.6, 0.1, 0.7), 
    yi = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
    0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L)), .Names = c("id", 
"xi", "xij", "yi"), row.names = c(NA, -100L), class = "data.frame")

library(lme4)

res0 <- glm(yi ~ xi + xij, data=dat, family=binomial)
summary(res0)

res1 <- glmer(yi ~ xi + xij + (1 | id), data=dat, family=binomial)
summary(res1)

logLik(res0)
logLik(res1)
anova(res1, res0)

The last three lines yield:

> logLik(res0)
'log Lik.' -29.96427 (df=3)
> logLik(res1)
'log Lik.' -29.96427 (df=4)
> 
> anova(res1, res0)
Data: dat
Models:
res0: yi ~ xi + xij
res1: yi ~ xi + xij + (1 | id)
     Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)
res0  3 65.929 73.744 -29.964   59.929                        
res1  4 67.929 78.349 -29.964   59.929     0      1          1

So, the (log)-likelihoods are identical, since the id level variance component is estimated to be zero. The AIC value of the mixed-effects model is therefore 2 points larger, as expected (since the model has one more parameter).

One thing to note though: The default for glmer() is nAGQ=1, which means that the Laplace approximation is used. Let's use "proper" adaptive quadrature:

res1 <- glmer(yi ~ xi + xij + (1 | id), data=dat, family=binomial, nAGQ=7)
logLik(res0)
logLik(res1)
anova(res1, res0)

This yields:

>     logLik(res0)
'log Lik.' -29.96427 (df=3)
>     logLik(res1)
'log Lik.' -29.96427 (df=4)
>     anova(res1, res0)
Error in anova.merMod(res1, res0) : 
  GLMMs with nAGQ>1 have log-likelihoods incommensurate with glm() objects

The variance component is still estimated to be zero and the (log)-likelihoods are identical. But anova() spits out an error that indicates that these models should not not be compared against each other.

Linear Regression – Comparing Slopes Between Non-Nested Models

Welcome here.

A simple solution is to define a new binary variables which takes the value $1$ for the subset of the data and $0$ for all other observations not included in the subset. Then you can interact the slope you are interested in with the binary variable to see whether you find a statistical significant differences in the slope between the subset and the whole dataset:

df$sample = 0 # 'sample' takes the value 0 for the whole dataset
df[1:6, "sample"] = 1 # 'sample' takes the value 1 for the subset
linear_regression_model_with_interaction = lm(response ~ time + time:sample, data = df)
summary(linear_regression_model_with_interaction)

The results look as following. In the example data you presented, we find statistical differences in the slope estimated on the 10 percent level (ignoring potential heteroscedasticity etc. here, or whether it would be wise to also include a main effect for the 'sample' variable):

Call:
lm(formula = response ~ time + time:sample, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.61401 -0.63063  0.03804  0.58723  1.30236 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.5893     0.9292  -0.634 0.546128    
time          0.9287     0.1230   7.548 0.000132 ***
time:sample   0.3721     0.1632   2.281 0.056591 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.027 on 7 degrees of freedom
Multiple R-squared:  0.8915,    Adjusted R-squared:  0.8606 
F-statistic: 28.77 on 2 and 7 DF,  p-value: 0.0004201

PS: Potential heteroscedasticity can be taken into account via heteroscedasticity-robust standard error:

library('lmtest')
library('sandwich')
coeftest(linear_regression_model_with_interaction, vcov = vcovHC(linear_regression_model_with_interaction, type = "HC0"))

But keep in mind that robust standard errors might be not valid with small samples, this might be the reason why the p-value drops here (of course heteroscedasticity can bias your variance estimation in both directions but usually the standard error get larger).

t test of coefficients:

             Estimate Std. Error t value  Pr(>|t|)    
(Intercept) -0.589293   0.357539 -1.6482   0.14330    
time         0.928694   0.079517 11.6792 7.625e-06 ***
time:sample  0.372132   0.109051  3.4124   0.01125 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Best Answer

Related Solutions

Mixed Model – How to Compare Mixed-Effect and Fixed-Effect Generalised Linear Models Using BIC

Linear Regression – Comparing Slopes Between Non-Nested Models

Related Question