Solved – Violated Normality of Residuals Assumption in Linear Mixed Model

lme4-nlmemixed model

I have a question regarding how concerned I should be regarding a potential violation from the normality of residuals assumption in a linear mixed model. I have a relatively small data set, and after fitting the model (using 'lmer' in R), a Shapiro-Wilks test reveals a significant deviation of the residuals from a normal distribution. Log-transformations of my variables do not deal with this satisfactorily.

In my search for a response how to deal with this, I encountered advice that tests of normality shouldn't be conducted (see the answer to a similar question here). Instead, it's suggested QQ-plots of random normal data with the same N as my residuals should be conducted to see whether the QQ-plot of my residuals is markedly different. Other advice I have found seems to suggest that inference appears to be robust to various violations of LMM assumptions
(see blog post here).

My Questions

1) If this was your data, would you be concerned about the lack of normality in the LMM residuals (see data & output below)?

2) If you are concerned, are you still concerned after the log-transformation (again, see data & output below)?

3) If the answer is "Yes" to both above, how could I deal with the non-normality of my residuals?

Data & Non-Transformed Analysis

# load relevant library
library(lme4)

#--- declare the data
study <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 6, 6,
           7, 7, 8, 8, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 13, 13, 
           13, 13, 14, 14, 14, 14, 14, 14, 15, 15, 16, 16, 16, 16, 17, 17)

condition <- c(1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 3, 3, 4, 4, 1, 1, 
               1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 
               2, 2, 1, 1, 2, 2, 3, 3, 1, 1, 1, 1, 2, 2, 1, 1)

age <- rep(c(1, 2), times = length(study) / 2)

congruent <- c(937, 611, 1067, 611, 1053, 943, 1097, 1015, 1155, 974, 860, 594,
               910, 605, 912, 632, 998, 660, 1989, 1176, 1337, 936, 2657, 1234, 
               1195, 999, 1010, 634, 1205, 620, 1154, 909, 1425, 1172, 1388, 
               1084, 641, 407, 1429, 810, 909, 510, 1358, 802, 1132, 639, 
               1501, 703, 1471, 955, 1342, 631, 1178, 676, 1033, 723)

incongruent <- c(1025, 705, 1204, 705, 1119, 1008, 1184, 1046, 1225, 1013, 1308, 
                 895, 1234, 901, 1204, 854, 1177, 828, 2085, 1269, 1350, 929, 
                 2697, 1231, 1233, 1032, 1062, 679, 1263, 674, 1183, 914, 1458, 
                 1184, 1382, 1086, 632, 424, 1510, 871, 978, 568, 1670, 881, 
                 1395, 747, 1694, 795, 1504, 999, 2112, 948, 1494, 992, 1039, 
                 781)

data <- data.frame(as.factor(study), as.factor(condition), age, congruent, 
                   incongruent)

#--- LMM analysis

# center age
data$age <- scale(data$age, center = TRUE, scale = FALSE)

# fit
fit <- lmer(incongruent ~ congruent + (1|study) + (1|condition), 
            data = data, REML = FALSE)

# plot & test the residual
qqnorm(resid(fit))
qqline(resid(fit))
shapiro.test(resid(fit))

Shapiro-Wilk normality test

data:  resid(fit)
W = 0.74417, p-value = 1.575e-08

Non-Transformed QQ-plot

Log-Transformed Data

# do the log transform 
data$congruent <- log(data$congruent)
data$incongruent <- log(data$incongruent)

# fit again
log_fit <- lmer(incongruent ~ congruent + (1|study) + (1|condition), 
                data = data, REML = FALSE)

# plot & test the residual
qqnorm(resid(log_fit))
qqline(resid(log_fit))
shapiro.test(resid(log_fit))

Shapiro-Wilk normality test

data:  resid(log_fit)
W = 0.93241, p-value = 0.003732

Log-transformed QQ-plot

Simulated Normal Distribution QQ-Plots

Performing this recommended simulation my log-transformed QQ-plots do not look too dissimilar to ones generated from a true normal distribution with the same sample size as my data (N = 52):

set.seed(42)
par(mfrow = c(3, 3))
for(i in 1:9){
  x = rnorm(52)
  qqnorm(x)
  qqline(x)
}

Resulting Figure

Best Answer

My answer to your questions would be (1) "yes" (I would worry a bit about the initial degree of non-Normality), (2) "no" (log-transformation seems to have improved the situation), (3) N/A (since I'm not worried), but a few more things to try if you are worried would be:

  • use robustlmm::rlmer() to do a robust LMM fit;
  • try the fit without the points that give the most extreme residuals (try lattice::qqmath(log_fit,id=0.1,idLabels=~.obs) to identify them by observation number) and see if it makes much of a difference
  • try another transformation to get closer to Normality (although I played around with this a little bit and it doesn't seem to help)

I'm a little surprised by the apparent mismatch between your sims (these examples look farther from Normality by eye) and the Shapiro test results (fairly strong evidence against the null hypothesis of Normality).

Related Question