Solved – Maximal model for linear mixed-effects model for repeated mesaures design

lme4-nlmemixed modelrepeated measures

I have a dataset of a psychological experiment with two within-subject factors. For simplicity, let's assume I'm collecting reaction time (RT) for the stimulus factors of color (red/blue/green) and shape (square/triangle). I'm interested both in the main effects and in the interaction.

For each subject, only the mean RT of each the six conditions is available (i.e. 6 data points per subject). Traditionally, such a dataset would have be analyzed by a repeated measures ANOVA with two factors. However, many of the observations are missing at random, so I want to use a mixed-effects model instead. Naively translating the repeated measures anova as I understand it to mixed effects terms, I get (lmer syntax)

Y ~ color*shape + 1|subject

However, I suspect that this model isn't maximal in the sense implied here. I can add color and shape random slopes:

Y ~ color*shape + color|subject + shape|subject + 1|subject

but the their meaning and justification in this context of discrete-levels factors isn't clear to me at all.

  1. I'd appreciate a principled explanation of what is the right maximal model for this simple case and why, or at least a helpful reference for that 'why'.

  2. If I'd add a third BETWEEN-subjects factor (e.g. subjects' sex), would modelling that require a different logic?

Best Answer

The maximal structure would need to include also a random effect for the interaction between color and shape, that is:

Y ~ color * shape + (color + shape + color:shape | subject)

This will result in all your predictors (color, shape and their interaction) having a fixed effect (constant for all subjects), and a random effect (individual fluctuations around the estimated fixed effect). In this sense the model is the maximal one. Note that it might not be fully equivalent to a repeated-measures ANOVA as it doesn't make equally strict assumptions on the correlational structure (see Tom's answer).

If you don't include the interaction in the random effect part of the formula, individual variation in the interaction effect will not be considered as "random", and the model will not be equivalent to a repeated-measures ANOVA. Of course, the variance of the random deviates for the interaction (or any other random effect) might be so small that including it in the model do not improve much the fit. You can check this not only with the AIC, but with a likelihood ratio test, as model with vs without one random effect are nested one another. In principle if the likelihood ratio test is not significant, it means that you can safely remove that random effect. Simplifying the random effect structures by removing negligible components would be an example of what in the article you linked is called data-driven approach.

You can simplify the model in this way, and it would still be equivalent to a repeated-measures ANOVA:

Y ~ color*shape + (1|subject) + (0+color|subject) + (0+shape|subject) + (0+color:shape|subject)

This syntax tells lmer to not estimate the correlations of random deviates across subjects. The drawback here is that, for example, you won't be able to tell whether subjects that have a large effect of color tend to have also a larger effect of shape (or smaller effect, in case of negative correlation).

You can easily include a between-subjects predictor, the only difference is that you can't add a random effect for it. "gender" for example cannot have a random effect grouped according to subject, but it can interact with the other fixed effects, e.g.:

Y ~ color * shape * gender + (color + shape + color:shape | subject)