I ran a repeated design whereby I tested 30 males and 30 females across three different tasks. I want to understand how the behaviour of males and females is different and how that depends on the task. I used both the lmer and lme4 package to investigate this, however, I am stuck with trying to check assumptions for either method. The code I run is
lm.full <- lmer(behaviour ~ task*sex + (1|ID/task), REML=FALSE, data=dat)
lm.full2 <-lme(behaviour ~ task*sex, random = ~ 1|ID/task, method="ML", data=dat)
I checked if the interaction was the best model by comparing it with the simpler model without the interaction and running an anova:
lm.base1 <- lmer(behaviour ~ task+sex+(1|ID/task), REML=FALSE, data=dat)
lm.base2 <- lme(behaviour ~ task+sex, random= ~1|ID/task), method="ML", data=dat)
anova(lm.base1, lm.full)
anova(lm.base2, lm.full2)
Q1: Is it ok to use these categorical predictors in a linear mixed model?
Q2: Do I understand correctly it is fine the outcome variable ("behaviour") does not need to be normally distributed itself (across sex/tasks)?
Q3: How can I check homogeneity of variance? For a simple linear model I use plot(LM$fitted.values,rstandard(LM))
. Is using plot(reside(lm.base1))
sufficient?
Q4: To check for normality is using the following code ok?
hist((resid(lm.base1) - mean(resid(lm.base1))) / sd(resid(lm.base1)), freq = FALSE); curve(dnorm, add = TRUE)
Best Answer
Q1: Yes - just like any regression model.
Q2: Just like general linear models, your outcome variable does not need to be normally distributed as a univariate variable. However, LME models assume that the residuals of the model are normally distributed. So a transformation or adding weights to the model would be a way of taking care of this (and checking with diagnostic plots, of course).
Q3:
plot(myModel.lme)
Q4:
qqnorm(myModel.lme, ~ranef(., level=2))
. This code will allow you to make QQ plots for each level of the random effects. LME models assume that not only the within-cluster residuals are normally distributed, but that each level of the random effects are as well. Vary thelevel
from 0, 1, to 2 so that you can check the rat, task, and within-subject residuals.EDIT: I should also add that while normality is assumed and that transformation likely helps reduce problems with non-normal errors/random effects, it's not clear that all problems are actually resolved or that bias isn't introduced. If your data requires a transformation, then be cautious about estimation of the random effects. Here's a paper addressing this.