Solved – Allowed comparisons of mixed effects models (random effects primarily)

likelihood-ratiolme4-nlmemixed modelr

I've been looking at mixed effects modelling using the lme4 package in R. I'm primarily using the lmer command so I'll pose my question through code that uses that syntax. I suppose a general easy question might be, is it OK to compare any two models constructed in lmer using likelihood ratios based on identical datasets? I believe the answer to that must be, "no", but I could be incorrect. I've read conflicting information on whether the random effects have to be the same or not, and what component of the random effects is meant by that? So, I'll present a few examples. I'll take them from repeated measures data using word stimuli, perhaps something like Baayen (2008) would be useful in interpreting.

Let's say I have a model where there are two fixed effects predictors, we'll call them A, and B, and some random effects… words and subjects that perceived them. I might construct a model like the following.

m <- lmer( y ~ A + B + (1|words) + (1|subjects) )

(note that I've intentionally left out data = and we'll assume I always mean REML = FALSE for clarity's sake)

Now, of the following models, which are OK to compare with a likelihood ratio to the one above and which are not?

m1 <- lmer( y ~ A + B + (A+B|words) + (1|subjects) )
m2 <- lmer( y ~ A + B + (1|subjects) )              
m3 <- lmer( y ~ A + B + (C|words) + (A+B|subjects) )
m4 <- lmer( y ~ A + B + (1|words) )                 
m5 <- lmer( y ~ A * B + (1|subjects) )

I acknowledge that the interpretation of some of these differences may be difficult, or impossible. But let's put that aside for a second. I just want to know if there's something fundamental in the changes here that precludes the possibility of comparing. I also want to know whether, if LRs are OK, and AIC comparisons as well.

Best Answer

Using maximum likelihood, any of these can be compared with AIC; if the fixed effects are the same (m1 to m4), using either REML or ML is fine, with REML usually preferred, but if they are different, only ML can be used. However, interpretation is usually difficult when both fixed effects and random effects are changing, so in practice, most recommend changing only one or the other at a time.

Using the likelihood ratio test is possible but messy because the usual chi-squared approximation doesn't hold when testing if a variance component is zero. See Aniko's answer for details. (Kudos to Aniko for both reading the question more carefully than I did and reading my original answer carefully enough to notice that it missed this point. Thanks!)

Pinhiero/Bates is the classic reference; it describes the nlme package, but the theory is the same. Well, mostly the same; Doug Bates has changed his recommendations on inference since writing that book and the new recommendations are reflected in the lme4 package. But that's more than I want to get into here. A more readable reference is Weiss (2005), Modeling Longitudinal Data.

Related Solutions

Solved – How to choose random- and fixed-effects structure in linear mixed models

I'm not sure there's really a canonical answer to this, but I'll give it a shot.

What is the recommended way to select the best fitting model in this context? When using log-likelihood ratio tests what is the recommended procedure? Generating models upwards (from null model to most complex model) or downwards (from most complex model to null model)? Stepwise inclusion or exclusion? Or is it recommended to put all models in one log-likelihood ratio test and select the model with the lowest p-value? How to compare models that are not nested?

It depends what your goals are.

In general you should be very, very careful about model selection (see e.g. this answer, or this post, or just Google "Harrell stepwise" ...).
If you are interested in having your p-values be meaningful (i.e. you are doing confirmatory hypothesis testing), you should not do model selection. However: it's not so clear to me whether model selection procedures are quite as bad if you are doing model selection on non-focal parts of the model, e.g. doing model selection on the random effects if your primary interest is inference on the fixed effects.
There's no such thing as "putting all the models in one likelihood ratio test" -- likelihood ratio testing is a pairwise procedure. If you wanted to do model selection (e.g.) on the random effects, I would probably recommend an "all at once" approach using information criteria as in this example -- that at least avoids some of the problems of stepwise approaches (but not of model selection more generally).
Barr et al. 2013 "Keep it maximal" Journal of Memory and Language (doi:10.1016/j.jml.2012.11.001) would recommend using the maximal model (only).
Shravan Vasishth disagrees, arguing that such models are going to be underpowered and hence problematic unless the data set is very large (and the signal-to-noise ratio is high)
Another reasonably defensible approach is to fit a large but reasonable model and then, if the fit is singular, remove terms until it isn't any more
With some caveats (enumerated in the GLMM FAQ), you can use information criteria to compare non-nested models with differing random effects (although Brian Ripley disagrees: see bottom of p. 6 here)

Is it recommended to first find the appropriate fixed-effects structure and then the appropriate random-effects structure or the other way round (I have found references for both options...)?

I don't think anyone knows. See previous answer about model selection more generally. If you could define your goals sufficiently clearly (which few people do), the question might be answerable. If you have references for both options, it would be useful to edit your question to include them ... (For what it's worth, this example (already quoted above) uses information criteria to select the random effects part, then eschews selection on the fixed-effect part of the model.

What is the recommended way of reporting results? Reporting the p-value from the log-likelihood ratio test comparing the full mixed-model (with the effect in question) to reduced model (without the effect in question). Or is it better to use log-likelihood ratio test to find the best fitting model and then use lmerTest to report p-values from the effects in the best fitting model?

This is (alas) another difficult question. If you report the marginal effects as reported by lmerTest, you have to worry about marginality (e.g., whether the estimates of the main effects of A and B are meaningful when there is an A-by-B interaction in the model); this is a huge can of worms, but is somewhat mitigated if you use contrasts="sum" as recommend by afex::mixed(). Balanced designs help a little bit too. If you really want to paper over all these cracks, I think I would recommend afex::mixed, which gives you output similar to lmerTest, but tries to deal with these issues.

Solved – Nested mixed effects with lme4

I would say

response ~ brightness+duration+(duration|subject)

would probably be a little better. (The simpler (1|duration:subject) model is not necessarily wrong, but might be oversimplified. If I were a peer reviewer of this work I would certainly ask for a justification of the simpler model ...) The (duration|subject) model is a "random-slopes" model, more or less (although if you have coded duration as a categorical (factor or ordered factor) variable the thing that varies randomly among subjects is not a slope per se, but a between-duration difference). The specification you have ((1|subject:duration)) assumes all subject-duration effects are drawn from a single (iid) Normal distribution; (duration|subject) assumes that the duration effects for a single individual are drawn from a $3 \times 3$ multivariate Normal distribution.

More precisely: comparing the random effect specification (1|subject:duration) gives the model for the conditional modes/BLUPs of subject $s$ for duration $d$ (or duration effect $d$, depending on how the model is parameterized) $$ b_{sd} \sim \textrm{Normal}(0,\sigma_{sd}^2) $$ whereas (duration|subject) gives

$$ \begin{split} b_{s\cdot} & \sim \textrm{MVN}( \mathbf 0,\Sigma) \\ \Sigma & = \left( \begin{array}{ccc} \sigma^2_1 & \sigma_{12} & \sigma_{13} \\ \sigma_{12} & \sigma^2_{2} & \sigma_{23} \\ \sigma_{13} & \sigma_{23} & \sigma^2_3 \\ \end{array} \right) \end{split} $$ i.e., the different duration levels each have different among-subjects variances, and the among-subject variation in different duration levels is correlated ($\Sigma$ is a general symmetric positive (semi)definite matrix). To get back to the previous model you would need to restrict $\sigma_1^2=\sigma_2^2=\sigma_3^2=\sigma_{sd}^2$ and all of the off-diagonal elements would be zero.

Best Answer

Related Solutions

Solved – How to choose random- and fixed-effects structure in linear mixed models

Solved – Nested mixed effects with lme4

Related Question