# Solved – Missing observations in a linear mixed model

missing datamixed model

Suppose you are measuring temperature $T_{ij}$ for $i =1, \dots ,4$ subjects and $j= 1, \dots ,4$ time points. For subject 1, suppose $T_{12}$ and $T_{14}$ were missing. Would you omit the entire record for subject 1 if you are running a linear mixed model?

You don't need to omit an individual if there is only missingness for a few observations. In fact, you want to include participants with missingness to increase your power and avoid biasing your results. The nice thing about mixed-effects is that they handle missing data pretty well with maximum likelihood estimation, especially in the context of longitudinal designs.

After taking a look at the syntax below, you'll notice that the estimates between the full model and the missingness model are fairly similar given the context of the extremely small sample size. Additionally, if you specify a random slope you can you can also extract Empirical Bayes estimates using the ranef() function, which gives you an estimated slope for each participant.

These are calculated using both information from the individual and information from the rest of the sample. In the case of more extreme observations or individuals with smaller sample sizes (due to missing data), estimates will be adjusted toward the mean of the overall sample, which is a concept known as "shrinkage." There is a pretty good review on growth curves in a mixed-effects framework that can be found here, although the author uses the nlme package rather than lme4.

require(lme4)

# Set the seed to make the code reproducible
set.seed(28)

# Simulate a growth curve for 4 participants, each with 4 time points. Assume a random
# intercept and fixed slope.
simData <- expand.grid(ID = 1:4, Time = 0:3)
simData <- simData[order(simData$ID), ] randInt <- rnorm(n = 4, mean = 0, sd = 2) slope <- 2 randError <- rnorm(n = nrow(simData), mean = 0, sd = 2) response <- c(NA) for(i in 1:nrow(simData)){ df <- simData[i, ] response[i] <- randInt[df$ID] + df$Time * slope + randError[i] } simData$response <- response

# Use lmer to model the growth curve
fullMod <- lmer(response ~ Time + (1 | ID), data = simData)
summary(fullMod)

# Number of obs: 16, groups: ID, 4

# Add in missingness for only one time point
simData[2, 3] <- NA

missMod <- lmer(response ~ Time + (1 | ID), data = simData)
summary(missMod)

# Number of obs: 15, groups: ID, 4