Mixed Models in R – Checking Assumptions in lmer/lme

assumptionslme4-nlmemixed modelr

I ran a repeated design whereby I tested 30 males and 30 females across three different tasks. I want to understand how the behaviour of males and females is different and how that depends on the task. I used both the lmer and lme4 package to investigate this, however, I am stuck with trying to check assumptions for either method. The code I run is

lm.full <- lmer(behaviour ~ task*sex + (1|ID/task), REML=FALSE, data=dat)
lm.full2 <-lme(behaviour ~ task*sex, random = ~ 1|ID/task, method="ML", data=dat)

I checked if the interaction was the best model by comparing it with the simpler model without the interaction and running an anova:

lm.base1 <- lmer(behaviour ~ task+sex+(1|ID/task), REML=FALSE, data=dat)
lm.base2 <- lme(behaviour ~ task+sex, random= ~1|ID/task), method="ML", data=dat)
anova(lm.base1, lm.full)
anova(lm.base2, lm.full2)

Q1: Is it ok to use these categorical predictors in a linear mixed model?
Q2: Do I understand correctly it is fine the outcome variable ("behaviour") does not need to be normally distributed itself (across sex/tasks)?
Q3: How can I check homogeneity of variance? For a simple linear model I use plot(LM$fitted.values,rstandard(LM)). Is using plot(reside(lm.base1)) sufficient?
Q4: To check for normality is using the following code ok?

hist((resid(lm.base1) - mean(resid(lm.base1))) / sd(resid(lm.base1)), freq = FALSE); curve(dnorm, add = TRUE)

Best Answer

Q1: Yes - just like any regression model.

Q2: Just like general linear models, your outcome variable does not need to be normally distributed as a univariate variable. However, LME models assume that the residuals of the model are normally distributed. So a transformation or adding weights to the model would be a way of taking care of this (and checking with diagnostic plots, of course).

Q3: plot(myModel.lme)

Q4: qqnorm(myModel.lme, ~ranef(., level=2)). This code will allow you to make QQ plots for each level of the random effects. LME models assume that not only the within-cluster residuals are normally distributed, but that each level of the random effects are as well. Vary the level from 0, 1, to 2 so that you can check the rat, task, and within-subject residuals.

EDIT: I should also add that while normality is assumed and that transformation likely helps reduce problems with non-normal errors/random effects, it's not clear that all problems are actually resolved or that bias isn't introduced. If your data requires a transformation, then be cautious about estimation of the random effects. Here's a paper addressing this.

Related Question