Solved – Unbalanced linear mixed effect modeling for longitudinal data with lme4

lme4-nlmemixed modelpanel datarrepeated measures

I'm new to longitudinal analyses, and I'm having trouble formulating a model that accurately reflects my study design. This study recruited subjects for two groups (dx vs. control), with measurements taken for each subject at three different timepoints. Age at baseline varied from subject to subject; from what I've read, this means the design is "unbalanced."

My data frame is organized such that if row n reads:

[subject_ID = x, baseline_age = y, Group = 1, Timepoint = 1, DV = k]

then row n+1 reads:

[subject_ID = x, baseline_age = y, Group = 1, Timepoint = 2, DV = m]

I'm interested in relationships between baseline_age, timepoint, and Group. If baseline_age weren't a factor, I think the R code would be as follows:

mod1 <- lmer(DV ~ timepoint + Group + timepoint:Group + (timepoint|subject_ID), 
             data = mydat)

where (timepoint | subject_ID) reflects the fact that timepoint varies at the individual level. However, assuming the above is correct, my confusion arises when I try to model random effects with baseline_age entered into the equation. Since baseline_age and subject_ID are perfectly correlated, would it be possible to use baseline_age as proxy for subject_ID in lmer? Or should I model a second random effect? Specifically, I'm considering the following three-way interaction model:

mod2 <- lmer(DV ~ timepoint + Group + baseline_age + timepoint:Group + 
                  timepoint:baseline_age + Group:baseline_age + 
                  timepoint:Group:baseline_age + (timepoint | baseline_age), 
             data = my dat)

Best Answer

Your second model does not make sense because observations are not grouped/clustered/nested within baseline_age.

The first model does make sense because observations are grouped/nested/clustered within subject_ID (because you have repeated measures). There is no further clustering as far as I can gather from the description.

So, a good initial model would be

mod2 <- lmer(DV ~ timepoint + Group + baseline_age + (1 | subject_ID), 
         data = mydat)

The coefficient for timepoint will provide the linear growth estimate(s) (whether it is coded as a factor or numeric matters, see below), the coefficient for Group will give the treatment effect, while controlling for differential baseline ages, while the random intercept for subject_ID will allow each subject's intercept (at the study inception point) to vary. You could include interactions of the fixed effects, at this stage, if this makes sense to your research question.

Subsequent to this, you might want to include one or more of the fixed effects variables as random coefficients (slopes) if the effects of these are thought to vary between subject, such as timepoint that you mention. Note that this is not the same as your statement in the question: "where (timepoint | subject_ID) reflects the fact that timepoint varies at the individual level". Obviously timepoint varies at the individual level, because you have repeated measures. The random intercept deals with this, but it assumes that the slope (coefficient) for it is the same for each subject. If you have reason to believe that the slopes should vary between subjects then you can include one or more on the left side of the | symbol). However, if timepoint is a factor then this will result in a separate random effect for each level, which will increase the computational burden an possibly cause numerical problems (that is, assuming you have enough observations to make such a model identifiable to begin with), as well as making the model interpretation more complex.

Also note that if timepoint is a factor then you will get a seperate fixed effect estimate for each level too, which may or may not be what you want (in my experience it is not usually what you want unless the number of levels is small). So, if it isn't already, you might also want to consider coding timepoint as a numeric variable, this will then give you a single fixed main effect. This will model linear growth in your outcome/response, giving each subject their own intercept. If you also add timepoint as a random slope, then you can allow each subject to have their own slope. If you want to cater for non-linear growth then you could add a quadratic variable for timepoint (centering it first to avoid collinearity).

Related Solutions

Solved – Unbalanced mixed effect ANOVA for repeated measures

The lme/lmer functions from the nlme/lme4 packages are able to deal with unbalanced designs. You should make sure that time is a numeric variable. You would also probably want to test for different types of curves as well. The code will look something like this:

library(lme4)
#plot data with a plot per person including a regression line for each
xyplot(heart.rate ~ time|id, groups=treatment, type= c("p", "r"), data=heart)

#Mixed effects modelling
#variation in intercept by participant
lmera.1 <- lmer(heart.rate ~ treatment * time + (1|id), data=heart)
#variation in intercept and slope without correlation between the two
lmera.2 <- lmer(heart.rate ~ treatment * time + (1|id) + (0+time|id), data=heart)
#As lmera.1 but with correlation between slope and intercept
lmera.3 <- lmer(heart.rate ~ treatment * time + (1+time|id), data=heart)

#Determine which random effects structure fits the data best
anova(lmera.1, lmera.2, lmera.3)

To get quadratic models use the formula "heart.rate ~ treatment * time * I(time^2) + (random effects)".

Update:
In this case where treatment is a between-subjects factor, I would stick with the model specifications above. I don't think the term (0+treatment|time) is one that you want included in the model, to me it doesn't make sense in this instance to treat time as a random-effects grouping variable.

But to answer your question of "what does the correlation -0.504 mean between treat0 and treat1" this is the correlation coefficient between the two treatments where each time grouping is one pair of values. This makes more sense if id is the grouping factor and treatment is a within-subjects variable. Then you have an estimate of the correlation between the intercepts of the two conditions.

Before making any conclusions about the model, refit it with lmera.2 and include REML=F. Then load the "languageR" package and run:

plmera.2<-pvals.fnc(lmera.2)
plmera.2

Then you can get p-values, but by the looks of it, there is probably a significant effect of time and a significant effect of treatment.

Solved – How to perform linear mixed effect model on longitudinal data in two conditions

The effect of time is allowed to vary between conditions by means of the interaction term Condition * Time
You need a random part for the subjects, I call it subjectID below, to get an random intercept per subject.
Since all levels of the "experiment" factor is in the data, (exp1, exp2 and exp3), and there are only three levels of that factor, you should treat it as a fixed term.

This gives the following model:

fm <- lmer(Measurement ~ 1 + Time * Condition + exp + (1 | subjectID), data = your.data.frame)

The answer to your first question is given by inspecting the estimated coefficients for the interaction term:

summary(fm)

This will give you the coefficients with their standard errors. A convenient way of getting the answer is:

install.packages("effects")
library("effects")
my.eff <- Effect(c("Condition", "Time"), fm)
plot(my.eff)

This snippet will install the package effects, and apply the function Effect() from that package.

The answer to the second question is given directly in the output of summary(fm): it would be the p-value of the interaction. If "WT" is the reference level of "Condition", then the relevant coefficient would be "Time*Condition:KO".

The third question requires some calculations, especially for the value of "Condition" which is not the reference level. You need to add the coefficients, and calculate the variance of the sum using the formula for the sum of weighted normally distributed random variables

Or you can extract the confidence interval from the object my.eff

summary(my.eff)

If you provide a sample of your data, I could include the exact commands needed.

Best Answer

Related Solutions

Solved – Unbalanced mixed effect ANOVA for repeated measures

Solved – How to perform linear mixed effect model on longitudinal data in two conditions

Related Question