Solved – P value for interaction term in mixed effects models using lme4

lme4-nlmemixed modelp-valuer

I'm analysing some behavioural data using lme4 in R, mostly following Bodo Winter's excellent tutorials, but I don't understand if I'm handling interactions properly. Worse, no-one else involved in this research uses mixed models, so I'm a bit adrift when it comes to making sure things are right.

Rather than just post a cry for help, I thought I should make my best effort at interpreting the problem, and then beg your collective corrections. A few other asides are:

  • While writing, I've found this question, showing that nlme more directly give p values for interaction terms, but I think it's still valid to ask with relation to lme4.
  • Livius' answer to this question provided links to a lot of additional reading, which I'll be trying to get through in the next few days, so I'll comment with any progress that brings.

In my data, I have a dependent variable dv, a condition manipulation (0 = control, 1 = experimental condition, which should result in a higher dv), and also a prerequisite, labelled appropriate: trials coded 1 for this should show the effect, but trials coded 0 might not, because a crucial factor is missing.

I have also included two random intercepts, for subject, and for target, reflecting correlated dv values within each subject, and within each of the 14 problems solved (each participant solved both a control and an experimental version of each problem).

library(lme4)
data = read.csv("data.csv")

null_model        = lmer(dv ~ (1 | subject) + (1 | target), data = data)
mainfx_model      = lmer(dv ~ condition + appropriate + (1 | subject) + (1 | target),
                         data = data)
interaction_model = lmer(dv ~ condition + appropriate + condition*appropriate +
                              (1 | subject) + (1 | target), data = data)
summary(interaction_model)

Output:

## Linear mixed model fit by REML ['lmerMod']
## ...excluded for brevity....
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 0.006594 0.0812  
##  target   (Intercept) 0.000557 0.0236  
##  Residual             0.210172 0.4584  
## Number of obs: 690, groups: subject, 38; target, 14
## 
## Fixed effects:
##                                Estimate Std. Error t value
## (Intercept)                    0.2518     0.0501    5.03
## conditioncontrol               0.0579     0.0588    0.98
## appropriate                   -0.0358     0.0595   -0.60
## conditioncontrol:appropriate  -0.1553     0.0740   -2.10
## 
## Correlation of Fixed Effects:
## ...excluded for brevity.

ANOVA then shows interaction_model to be a significantly better fit than mainfx_model, from which I conclude that there's a significant interaction present (p = .035).

anova(mainfx_model, interaction_model)

Output:

## ...excluded for brevity....
##                   Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)  
## mainfx_model       6 913 940   -450      901                          
## interaction_model  7 910 942   -448      896  4.44      1      0.035 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From there, I isolate a subset of the data for which the appropriate requirement is met (i.e., appropriate = 1), and for it fit a null model, and a model including condition as an effect, compare the two models using ANOVA again, and lo, find that condition is a significant predictor.

good_data = data[data$appropriate == 1, ]
good_null_model   = lmer(dv ~ (1 | subject) + (1 | target), data = good_data)
good_mainfx_model = lmer(dv ~ condition + (1 | subject) + (1 | target), data = good_data)

anova(good_null_model, good_mainfx_model)

Output:

## Data: good_data
## models:
## good_null_model: dv ~ (1 | subject) + (1 | target)
## good_mainfx_model: dv ~ condition + (1 | subject) + (1 | target)
##                   Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)  
## good_null_model    4 491 507   -241      483                          
## good_mainfx_model  5 487 507   -238      477  5.55      1      0.018 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Best Answer

I don't see too much to say here. I think you have done a good job.

There are several ways that people have discussed to test effects and get p-values for complicated mixed effects models. There is a good overview here. The best is to use computationally intensive methods (bootstrapping or Bayesian methods), but this is more advanced for most people. The second best (and best convenient) method is to use a likelihood ratio test. That is what anova() (technically ?anova.merMod()) is doing. It is important to only use a likelihood ratio test of models that were fit with the full maximum likelihood, rather than restricted maximum likelihood (REML). On the other hand, for your final model, and for interpretation, you want to use REML. This is confusing for many people. In your output, we see that you fit your models with REML (this is because the option is set to TRUE by default in lmer(). That would mean that your test is invalid, however, because this is such a common mistake, anova.merMod() contains a refit argument which by default is set to TRUE, and you didn't change it. So the foresight of the package developers has saved you there.

Regarding your strategy for unpacking the interaction, what you did is fine. Just bear in mind that the interaction uses all the data for its test. It is possible to have a significant interaction but have neither of the stratified tests be significant, which confuses some people. (It doesn't seem to have happened to you, though.)