Solved – Maximal model for linear mixed-effects model for repeated mesaures design

lme4-nlmemixed modelrepeated measures

I have a dataset of a psychological experiment with two within-subject factors. For simplicity, let's assume I'm collecting reaction time (RT) for the stimulus factors of color (red/blue/green) and shape (square/triangle). I'm interested both in the main effects and in the interaction.

For each subject, only the mean RT of each the six conditions is available (i.e. 6 data points per subject). Traditionally, such a dataset would have be analyzed by a repeated measures ANOVA with two factors. However, many of the observations are missing at random, so I want to use a mixed-effects model instead. Naively translating the repeated measures anova as I understand it to mixed effects terms, I get (lmer syntax)

Y ~ color*shape + 1|subject

However, I suspect that this model isn't maximal in the sense implied here. I can add color and shape random slopes:

Y ~ color*shape + color|subject + shape|subject + 1|subject

but the their meaning and justification in this context of discrete-levels factors isn't clear to me at all.

I'd appreciate a principled explanation of what is the right maximal model for this simple case and why, or at least a helpful reference for that 'why'.
If I'd add a third BETWEEN-subjects factor (e.g. subjects' sex), would modelling that require a different logic?

Best Answer

The maximal structure would need to include also a random effect for the interaction between color and shape, that is:

Y ~ color * shape + (color + shape + color:shape | subject)

This will result in all your predictors (color, shape and their interaction) having a fixed effect (constant for all subjects), and a random effect (individual fluctuations around the estimated fixed effect). In this sense the model is the maximal one. Note that it might not be fully equivalent to a repeated-measures ANOVA as it doesn't make equally strict assumptions on the correlational structure (see Tom's answer).

If you don't include the interaction in the random effect part of the formula, individual variation in the interaction effect will not be considered as "random", and the model will not be equivalent to a repeated-measures ANOVA. Of course, the variance of the random deviates for the interaction (or any other random effect) might be so small that including it in the model do not improve much the fit. You can check this not only with the AIC, but with a likelihood ratio test, as model with vs without one random effect are nested one another. In principle if the likelihood ratio test is not significant, it means that you can safely remove that random effect. Simplifying the random effect structures by removing negligible components would be an example of what in the article you linked is called data-driven approach.

You can simplify the model in this way, and it would still be equivalent to a repeated-measures ANOVA:

Y ~ color*shape + (1|subject) + (0+color|subject) + (0+shape|subject) + (0+color:shape|subject)

This syntax tells lmer to not estimate the correlations of random deviates across subjects. The drawback here is that, for example, you won't be able to tell whether subjects that have a large effect of color tend to have also a larger effect of shape (or smaller effect, in case of negative correlation).

You can easily include a between-subjects predictor, the only difference is that you can't add a random effect for it. "gender" for example cannot have a random effect grouped according to subject, but it can interact with the other fixed effects, e.g.:

Y ~ color * shape * gender + (color + shape + color:shape | subject)

Related Solutions

Solved – When is a repeated measures ANOVA preferred over a mixed-effects model

I'm not totally sure what actual model "repeated measures ANOVA" describes, but I think one general issue is whether to put random effects of any kind in a model rather than e.g. just adjust variance estimates to cover the induced dependencies (as in the Panel Corrected Standard Errors vs multilevel models debate in time series cross-sectional data analysis). So I'll have a go at that question first, then address yours.

Fixed and Random Effects

Two complementary principles about when to use a random rather than fixed effect are the following:

Represent a thing (subject, stimulus type, etc.) with a random effect when you are interested using the model to generalise to other instances of that thing not included in the current analysis, e.g. other subject or other stimulus types. If not use a fixed effect.
Represent a thing with a random effect when you think that for any instance of the thing, other instances in the data set are potentially informative about it. If you expect no such informativeness, then use a fixed effect.

Both motivate explicitly including subject random effects: you are usually interested in human populations in general and the elements of each subject's response set are correlated, predictable from each other and therefore informative about each other. It is less clear for things like stimuli. If there will only ever be three types of stimuli then 1. will motivate a fixed effect and 2. will make the decision depend on the nature of the stimuli.

Your questions

One reason to use a mixed model over a repeated effects ANOVA is that the former are considerably more general, e.g. they work equally easily with balanced and unbalanced designs and they are easily extended to multilevel models. In my (admittedly limited) reading on classical ANOVA and its extensions, mixed models seem to cover all the special cases that ANOVA extensions do. So I actually can't think of a statistical reason to prefer repeated measures ANOVA. Others may be able to help here. (A familiar sociological reason is that your field prefers to read about methods its older members learnt in graduate school, and a practical reason is that it might take a bit longer to learn how to use mixed models than a minor extension of ANOVA.)

Note

A caveat for using random effects, most relevant for non-experimental data, is that to maintain consistency you have to either assume that the random effects are uncorrelated with the model's fixed effects, or add fixed effect means as covariates for the random effect (discussed e.g. in Bafumi and Gelman's paper).

Solved – Using lmer for repeated-measures linear mixed-effect model

I think that your approach is correct. Model m1 specifies a separate intercept for each subject. Model m2 adds a separate slope for each subject. Your slope is across days as subjects only participate in one treatment group. If you write model m2 as follows it's more obvious that you model a separate intercept and slope for each subject

m2 <- lmer(Obs ~ Treatment * Day + (1+Day|Subject), mydata)

This is equivalent to:

m2 <- lmer(Obs ~ Treatment + Day + Treatment:Day + (1+Day|Subject), mydata)

I.e. the main effects of treatment, day and the interaction between the two.

I think that you don't need to worry about nesting as long as you don't repeat subject ID's within treatment groups. Which model is correct, really depends on your research question. Is there reason to believe that subjects' slopes vary in addition to the treatment effect? You could run both models and compare them with anova(m1,m2) to see if the data supports either one.

I'm not sure what you want to express with model m3? The nesting syntax uses a /, e.g. (1|group/subgroup).

I don't think that you need to worry about autocorrelation with such a small number of time points.

Best Answer

Related Solutions

Solved – When is a repeated measures ANOVA preferred over a mixed-effects model

Solved – Using lmer for repeated-measures linear mixed-effect model

Related Question