Solved – How to choose random- and fixed-effects structure in linear mixed models

likelihood-ratiolme4-nlmemixed modelmodel selectionrepeated measures

Consider the following data from a two-way within subjects design:

df <- "http://personality-project.org/r/datasets/R.appendix4.data"
df <- read.table(df,header=T)
head(df)

Observation Subject Task Valence Recall
1           1     Jim Free     Neg      8
2           2     Jim Free     Neu      9
3           3     Jim Free     Pos      5
4           4     Jim Cued     Neg      7
5           5     Jim Cued     Neu      9
6           6     Jim Cued     Pos     10

I would like to analyze this using mixed-linear models. Considering all possible fixed- and random-effects there are multiple possible models:

# different fixed effects with random-intercept
a0 <- lmer(Recall~1 + (1|Subject), REML=F,df)
a1 <- lmer(Recall~Task + (1|Subject), REML=F,df)
a2 <- lmer(Recall~Valence + (1|Subject), REML=F,df)
a3 <- lmer(Recall~Task+Valence + (1|Subject), REML=F,df)
a4 <- lmer(Recall~Task*Valence + (1|Subject), REML=F,df)

# different fixed effects with random-intercept-random-slope
b0 <- lmer(Recall~1 + (1|Subject) + (0+Task|Subject) + (0+Valence|Subject), REML=F,df)
b1 <- lmer(Recall~Task + (1|Subject) + (0+Task|Subject) + (0+Valence|Subject), REML=F,df)
b2 <- lmer(Recall~Valence + (1|Subject) + (0+Task|Subject) + (0+Valence|Subject), REML=F,df)
b3 <- lmer(Recall~Task+Valence + (1|Subject) + (0+Task|Subject) + (0+Valence|Subject), REML=F,df)
b4 <- lmer(Recall~Task*Valence + (1|Subject) + (0+Task|Subject) + (0+Valence|Subject), REML=F,df)

# different fixed effects with random-intercept-random-slope including variance-covariance matrix
c0 <- lmer(Recall~1 + (1 + Valence + Task|Subject), REML=F,df)
c1 <- lmer(Recall~Task + (1 + Valence + Task|Subject), REML=F,df)
c2 <- lmer(Recall~Valence + (1 + Valence + Task|Subject), REML=F,df)
c3 <- lmer(Recall~Task+Valence + (1 + Valence + Task|Subject), REML=F,df)
c4 <- lmer(Recall~Task*Valence + (1 + Valence + Task|Subject), REML=F,df)
  1. What is the recommended way to select the best fitting model in this context? When using log-likelihood ratio tests what is the recommended procedure? Generating models upwards (from null model to most complex model) or downwards (from most complex model to null model)? Stepwise inclusion or exclusion? Or is it recommended to put all models in one log-likelihood ratio test and select the model with the lowest p-value? How to compare models that are not nested?

  2. Is it recommended to first find the appropriate fixed-effects structure and then the appropriate random-effects structure or the other way round (I have found references for both options…)?

  3. What is the recommended way of reporting results? Reporting the p-value from the log-likelihood ratio test comparing the full mixed-model (with the effect in question) to reduced model (without the effect in question). Or is it better to use log-likelihood ratio test to find the best fitting model and then use lmerTest to report p-values from the effects in the best fitting model?

Best Answer

I'm not sure there's really a canonical answer to this, but I'll give it a shot.

What is the recommended way to select the best fitting model in this context? When using log-likelihood ratio tests what is the recommended procedure? Generating models upwards (from null model to most complex model) or downwards (from most complex model to null model)? Stepwise inclusion or exclusion? Or is it recommended to put all models in one log-likelihood ratio test and select the model with the lowest p-value? How to compare models that are not nested?

It depends what your goals are.

  • In general you should be very, very careful about model selection (see e.g. this answer, or this post, or just Google "Harrell stepwise" ...).
  • If you are interested in having your p-values be meaningful (i.e. you are doing confirmatory hypothesis testing), you should not do model selection. However: it's not so clear to me whether model selection procedures are quite as bad if you are doing model selection on non-focal parts of the model, e.g. doing model selection on the random effects if your primary interest is inference on the fixed effects.
  • There's no such thing as "putting all the models in one likelihood ratio test" -- likelihood ratio testing is a pairwise procedure. If you wanted to do model selection (e.g.) on the random effects, I would probably recommend an "all at once" approach using information criteria as in this example -- that at least avoids some of the problems of stepwise approaches (but not of model selection more generally).
  • Barr et al. 2013 "Keep it maximal" Journal of Memory and Language (doi:10.1016/j.jml.2012.11.001) would recommend using the maximal model (only).
  • Shravan Vasishth disagrees, arguing that such models are going to be underpowered and hence problematic unless the data set is very large (and the signal-to-noise ratio is high)
  • Another reasonably defensible approach is to fit a large but reasonable model and then, if the fit is singular, remove terms until it isn't any more
  • With some caveats (enumerated in the GLMM FAQ), you can use information criteria to compare non-nested models with differing random effects (although Brian Ripley disagrees: see bottom of p. 6 here)

Is it recommended to first find the appropriate fixed-effects structure and then the appropriate random-effects structure or the other way round (I have found references for both options...)?

I don't think anyone knows. See previous answer about model selection more generally. If you could define your goals sufficiently clearly (which few people do), the question might be answerable. If you have references for both options, it would be useful to edit your question to include them ... (For what it's worth, this example (already quoted above) uses information criteria to select the random effects part, then eschews selection on the fixed-effect part of the model.

What is the recommended way of reporting results? Reporting the p-value from the log-likelihood ratio test comparing the full mixed-model (with the effect in question) to reduced model (without the effect in question). Or is it better to use log-likelihood ratio test to find the best fitting model and then use lmerTest to report p-values from the effects in the best fitting model?

This is (alas) another difficult question. If you report the marginal effects as reported by lmerTest, you have to worry about marginality (e.g., whether the estimates of the main effects of A and B are meaningful when there is an A-by-B interaction in the model); this is a huge can of worms, but is somewhat mitigated if you use contrasts="sum" as recommend by afex::mixed(). Balanced designs help a little bit too. If you really want to paper over all these cracks, I think I would recommend afex::mixed, which gives you output similar to lmerTest, but tries to deal with these issues.