Solved – comparison between groups by mixed effect model

lme4-nlmeMATLABmixed modelr

I would like to make a mixed effect model using "(g)lmer" in R and/or "fit(g)lme" in Matlab. However, I am not experienced with it. So please see whether I am doing right or wrong.

I need to compare behavioral outcomes (e.g., reaction times in response to stimulus onset) between two groups of subjects (e.g., 20 controls and 20 patients).

The task subjects performed include the following parameters:

Task difficulty (0: easy, 1: difficult)
Duration before stimulus onset (continuous value)

Each subject performed the task multiple times (e.g., 100 trials). On each trial, task difficulty and duration were chosen randomly.

The following parameters are also necessary:

Age of each subject (continuous value)
Group (0: control, 1: patient)

So, I made the following formula:

reaction time ~ (Task + Duration + Age)xGroup + (Task +
Duration | subject)

I want to say there are differences between controls and patients if the coefficients of variables containing "Group" (i.e., Group, Task:Group, Duration: Group, and Age:Group) are significant.

Is the above formula correct to say that?

Best Answer

The formula looks OK for your experimental question. You might even consider more complex models (e.g. with interaction Task:Duration). Note that the more complex is the model (with more parameters), the more likely you are to have convergence warnings. (A convergence warning might indicates that the model is too complex for your data, and if that happens you should probably go back to the simpler model.)

I would also suggest that you don't limit your analysis to simply checking whether a particular coefficient "is significant", but rather you estimate and report a 95% confidence interval for each coefficient. I don't know about Matlab, but in R it is pretty easy to estimate robust confidence intervals using bootstrap (check the confint.merMod function in the package lme4).

Finally, reaction times have usually a very asymmetrical distribution, which is can be a problem for a linear model. You should check the residuals of the model, for example with a normal quantile-quantile plot, and a scatterplot of residuals vs fitted values. If you see relevant deviations from normality in the distribution of residuals, you could try to transform your data (using for example a log, or a reciprocal transformation; have a look at this article by Kliegl and others for an example of this approach). Alternatively you could also use a generalized linear mixed-effects model (GLMM) instead of a linear one, and use a Gamma or an Inverse Gaussian link functions; have look at this other article for an example of this approach.

Related Solutions

Solved – Nested mixed effects with lme4

I would say

response ~ brightness+duration+(duration|subject)

would probably be a little better. (The simpler (1|duration:subject) model is not necessarily wrong, but might be oversimplified. If I were a peer reviewer of this work I would certainly ask for a justification of the simpler model ...) The (duration|subject) model is a "random-slopes" model, more or less (although if you have coded duration as a categorical (factor or ordered factor) variable the thing that varies randomly among subjects is not a slope per se, but a between-duration difference). The specification you have ((1|subject:duration)) assumes all subject-duration effects are drawn from a single (iid) Normal distribution; (duration|subject) assumes that the duration effects for a single individual are drawn from a $3 \times 3$ multivariate Normal distribution.

More precisely: comparing the random effect specification (1|subject:duration) gives the model for the conditional modes/BLUPs of subject $s$ for duration $d$ (or duration effect $d$, depending on how the model is parameterized) $$ b_{sd} \sim \textrm{Normal}(0,\sigma_{sd}^2) $$ whereas (duration|subject) gives

$$ \begin{split} b_{s\cdot} & \sim \textrm{MVN}( \mathbf 0,\Sigma) \\ \Sigma & = \left( \begin{array}{ccc} \sigma^2_1 & \sigma_{12} & \sigma_{13} \\ \sigma_{12} & \sigma^2_{2} & \sigma_{23} \\ \sigma_{13} & \sigma_{23} & \sigma^2_3 \\ \end{array} \right) \end{split} $$ i.e., the different duration levels each have different among-subjects variances, and the among-subject variation in different duration levels is correlated ($\Sigma$ is a general symmetric positive (semi)definite matrix). To get back to the previous model you would need to restrict $\sigma_1^2=\sigma_2^2=\sigma_3^2=\sigma_{sd}^2$ and all of the off-diagonal elements would be zero.

Solved – Maximal model for linear mixed-effects model for repeated mesaures design

The maximal structure would need to include also a random effect for the interaction between color and shape, that is:

Y ~ color * shape + (color + shape + color:shape | subject)

This will result in all your predictors (color, shape and their interaction) having a fixed effect (constant for all subjects), and a random effect (individual fluctuations around the estimated fixed effect). In this sense the model is the maximal one. Note that it might not be fully equivalent to a repeated-measures ANOVA as it doesn't make equally strict assumptions on the correlational structure (see Tom's answer).

If you don't include the interaction in the random effect part of the formula, individual variation in the interaction effect will not be considered as "random", and the model will not be equivalent to a repeated-measures ANOVA. Of course, the variance of the random deviates for the interaction (or any other random effect) might be so small that including it in the model do not improve much the fit. You can check this not only with the AIC, but with a likelihood ratio test, as model with vs without one random effect are nested one another. In principle if the likelihood ratio test is not significant, it means that you can safely remove that random effect. Simplifying the random effect structures by removing negligible components would be an example of what in the article you linked is called data-driven approach.

You can simplify the model in this way, and it would still be equivalent to a repeated-measures ANOVA:

Y ~ color*shape + (1|subject) + (0+color|subject) + (0+shape|subject) + (0+color:shape|subject)

This syntax tells lmer to not estimate the correlations of random deviates across subjects. The drawback here is that, for example, you won't be able to tell whether subjects that have a large effect of color tend to have also a larger effect of shape (or smaller effect, in case of negative correlation).

You can easily include a between-subjects predictor, the only difference is that you can't add a random effect for it. "gender" for example cannot have a random effect grouped according to subject, but it can interact with the other fixed effects, e.g.:

Y ~ color * shape * gender + (color + shape + color:shape | subject)