According to Lo and Andrews, 2015 (https://doi.org/10.3389/fpsyg.2015.01171) raw Reaction Time (RT) should be analyzed with a GLMM, instead of transformed values with LMM or even ANOVA. They and others point out a gamma or inverse gaussian with an identity link function should be used.
My data is from RTs before and after sleep (~2200h and ~0730h). Participants (22) were asked to press a button every time a cross was displayed on the screen. This was measured multiple times on each participant, on each timepoint and in two conditions (placebo and intervention (participants received a stimulus that affects their sleep pattern).
head(data): (trimmed >100ms and <2000ms)
subjectNumber expDay age bmi weight height treatment waiting reaction timep
2 N1 24 22.53 73 180 Control 6026 588 Before sleep
2 N1 24 22.53 73 180 Control 4470 326 Before sleep
2 N1 24 22.53 73 180 Control 2334 336 Before sleep
2 N1 24 22.53 73 180 Control 6005 289 Before sleep
2 N1 24 22.53 73 180 Control 4636 318 Before sleep
2 N1 24 22.53 73 180 Control 3515 315 Before sleep
I'm interested in knowing if my intervention improved the reaction time.
It's possible that people have better performance in the morning irregardless of any intervention (less tired), so the model should take this into account.
An interaction term (intervention*timepoint) is needed and will say if intervention had any effect after sleep, seeing that before sleep there should be no difference.
So far my model looks like this:
glmer(reaction ~ treatment * timep + (1|subjectNumber), data=., family = inverse.gaussian(link = "identity"))
summary():
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: inverse.gaussian ( identity )
Formula: reaction ~ treatment * trial + (1 | subjectNumber)
Data: .
AIC BIC logLik deviance df.resid
33283.9 33320.1 -16636.0 33271.9 3064
Scaled residuals:
Min 1Q Median 3Q Max
-3.1321 -0.4575 -0.1539 0.2122 17.8238
Random effects:
Groups Name Variance Std.Dev.
subjectNumber (Intercept) 4.071e+02 20.17610
Residual 1.404e-04 0.01185
Number of obs: 3070, groups: subjectNumber, 20
Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) 341.394 12.128 28.150 < 2e-16 ***
treatmentIntervention 4.614 2.709 1.703 0.08860 .
timepAfter sleep 7.745 2.763 2.803 0.00507 **
treatmentStimulation:trialAfter sleep -3.636 3.895 -0.933 0.35058
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) trtmnS trlAfs
trtmntStmlt -0.089
trilAftrslp -0.083 0.467
trtmntSt:As 0.048 -0.685 -0.701
The AIC is ridiculously high when compared to other RT models I've seen (range of 100–300), so with my short experience with these models I'm unsure about how good is the fit.
Is it correct to say, according to the above, that the treatment has no effect on RT, but sleep in general improves RT irregardless of treatment? I'm afraid this sleep effect may confound things in a way as to make this design invalid in the first place.
Best Answer
The high AIC value is not a problem in itself: the AIC is a measure of the relative quality of a model; it says something about how good is the fit of a model but only with respect to another model fit on the same dataset. You cannot compare it to other RT models if these are not fit on the same dataset because - for example - the AIC will depend also on the number of observation.
The residual plot looks fine to me: the fitted values are discretized in specific values because of the categorical nature of your predictor variables (time and treatment). More specifically, because your predictors are 2 categorical variables with 2 levels each, the model can only predict a different value for each combination of the two variables, hence it will output 4 different values for each subject. It would be different if you had also a continuous variable as predictors, e.g. the number of hours slept.
The only issue with the residual plot may be that the distribution of residuals seems slightly skewed toward positive values. This is due to the fact that reaction times distributions are, almost always, skewed, and could be fixed in principle with a parametric transformation of the dependent variable, such as the Box-Cox power transform. However since you prefer to avoid transformations, you may want to try also to fit the data with the gamma link function, to see if it the distribution of residuals improve. You can choose the preferred model (gamma vs. inverse Gaussian) based on the AIC, as the two models are fit on the same dataset.
Contrary to the previous answer, I think it can make sense to include in the model the interaction term
treatment:timep
(or at least to test it, then eventually remove it afterward if it does not improve the fit). Without the interaction, the model can only predict a purely additive combination of the effect of sleep and treatment: in other words, the model assumes that the effect of treatment is the same regardless of whether the improvement in reaction times due to sleep was large or small. This assumption, however, may not be true: for example, the effect of treatment may be larger for those subjects that show smaller benefit due to sleep alone (this would make sense if you consider that there may be a ceiling effect in the reaction time benefit, i.e. if reaction times cannot improve more than a certain amount). The interaction termtreatment:timep
would allow you to model such effects.One last comment, your model only includes random intercepts for each subject. This means that while the marginal mean of the RTs is allowed to vary across participants, the effects of sleep or treatments are not. If you suspect that these effects may vary largely across subjects, it may be worth trying to fit random slopes for these effects. For example, if you want to model individual differences in the effect of sleep as a random effect, the random effect part of the formula would be
(1+timep|subjectNumber)
. Of course, this will increase the complexity of the model, and the number of parameters estimated, and may results in issues during the estimation of the parameters (e.g. convergence warnings). You could start from the 'maximal' model, with all possible random effects included (the formula would be(treatment * timep | subjectNumber)
) and then proceed by simplyfying it if the estimation does not converge. For some advice on how to chose the right level of complexity given the data, I suggest you check this article by Bates & colleagues.