R Mixed Model – Correct Specification of Linear Mixed Model with Repeated Measures in R

lme4-nlmemixed modelrrepeated measures

I have a dataset with repeated measures data but I'm unsure of the specification of my model and subsequent contrasts:

In the dataset, participants responded to a measure (Score; continuous) at three different timepoints (Time; T1/T2/T3) that were not equally spaced (not sure if this is relevant). I'm informed that Sex and Age (older/younger) may have an impact on the outcome, so should be included in the model. However, due to a relatively small sample size, and some subgroups/cells potentially being empty, I don't think it would make sense to look at complex interactions between these variables (e.g., Time*Sex*Age) , but I would like to look at Time*Sex and Time*Age. So I think the following are fixed effects: Time, Sex, and Age, and ID would be a random effect and random slope.

So for issue 1: Could someone help me with the correct lme4 syntax? My initial guess is: mod <- lmer(Score ~ Time + Sex + Age + (1|ID), data=data). But I suspect this may be wrong because it doesn't include interactions and I'm not sure about the random aspects.

Then for issue 2: I'm hoping to use planned comparisons to look at differences in scores between T1 and T2. Then to answer a different question, I want to look at scores between T2 and T3, and T1 and T3.

I think the emmeans package could help with this, but I'm wondering about how to involve Sex and Age in the comparisons, if at all. I don't have specific hypotheses that involve these factors, but if they are presumed to have an impact on scores then it makes sense to include them somehow. Would this mean intending to run planned comparisons without them, but if either or both are found to be important in the lmer model, running some kind of post-hoc tests that feature these instead?

Any help would be really appreciated.

Best Answer

First, I would make sure that you have Time as a factor (not just a quantitative variable) in your model.

Second, I would suggest nailing down the issue of "potentially empty cells". Am I correct that you do at least have every subject measured at all 3 times? Or are there holes there? I'm hoping not, but if there are, that may not be a disaster.

I suggest doing something like with(data, table(Sex, Age)) to see if there are missing combinations of Sex and Age. If so, you really can't do a sensible analysis of either factor in its own right, and you might as well combine them into one factor, say data <- transform(data, Group = interaction(Sex, Age))

Then I'd suggest fitting the model including all interactions, because you don't know which to reasonably omit. So that'd be Sex*Age*Time + (1|ID) or Group*Time + (1|ID) depending on the previous paragraph. It does seem that if each subject is measured at the three times, then (1|ID) is a reasonable thing to put in for the error term.

For post hoc analysis, I don't see much validity to trying to pursue a small number of planned comparisons, because it all seems pretty exploratory at this point. Maybe save that for the next study that is more carefully designed and powered, and includes subjects in all age and sex combinations. By the time you finish mentioning what times you want to compare, you have covered all pairwise comparisons. So I suggest something like:

EMM <- emmeans(mod, ~ Sex * Age * Time)  ## or ~ Group * Time
pairs(EMM, simple = "Time")

The second statement will do all three pairwise comparisons of Time for each combination of the other factors.

It may be possible to do simpler comparisons if an anova (e.g., car::Anova(mod)) suggests that Age or Sex does not interact with Time. If so, you can fit a simpler model that excludes those interactions, and exclude that factor in EMM and we will just average over the levels of that factor.

Related Question