Linear Mixed Models – Insights into Two-Way Repeated Measures Analysis

fixed-effects-modellme4-nlmemixed modelrrandom-effects-model

This is my first endeavor into linear mixed models, and I haven't found an example that uses a fully repeated measures design, so I was hoping that I could get some help.

I have a dataset that looks like this:

Mydata <- data.frame(
Subject = c(1,2,3,4,5),
Condition =c(rep(c("High", "Low"), each=50)),
Size =c(rep(c(8, 20, 50, 70, 100), each=10)),
Estimate =c(10, 12, 15, 18, 8, 12, 12, 10, 14, 8, 25, 36, 29, 45, 38, 28, 36, 29, 40, 36, 68, 75, 65, 78, 60, 69, 74, 63, 80, 62, 85, 99, 84, 100, 90, 82, 99, 88, 102, 85, 140, 150, 190, 180, 200, 130, 160, 190, 190, 210, 8, 6, 9, 8, 10, 7, 7, 8, 9, 12, 20, 21, 25, 30, 26, 22, 23, 22, 30, 25, 45, 40, 50, 60, 55, 40, 45, 55, 57, 58, 70, 80, 60, 80, 75, 75, 78, 65, 60, 70, 100, 115, 120, 125, 110, 110, 105, 120, 120, 110)
)
Mydata <- Mydata[order(Mydata$Subject),]

In the data, I have 2 within-subjects factor:

  • Set size (8, 20, 50, 70, 100); 2 repetitions per level (Note: this is a continuous/ordinal numeric variable)
  • Condition (High vs. low)

Each subject was presented with a dot array of different set sizes (8, 20, 50, 70, 100), with 2 repetitions per set size, and they had to estimate how many dots they saw. Each subject did the same task twice with different levels of calibration (high vs. low). Hence, each row represents a single trial based on condition and set size.

My goal is to examine if there are main effects of Condition and Set size, and if there is a Condition x Set Size interaction.

From what I have read, a linear mixed model is well-suited especially if I want to model Subjects as a random factor (and Condition and Set Size as fixed effects), and to minimize data aggregation across levels of a factor.

My first set of questions are: Can I use the trial-level data as they are now? My actual dataset has 15 levels of set size and 8 repetitions per set size. Or would it be more appropriate to aggregate across repetitions to obtain a mean value per set size before I fit the mixed model?

After reading up on how to use lme4 and nlme, I tried to fit the following random-intercepts-only models using lme4 (I also fitted them with nlme):

baseline.model <- lmer(Estimate ~ 1 + (1|Subject), Mydata)

To test for main effect of set size

setsize.model <- lmer(Estimate ~ Size + (1|Subject), Mydata)
anova(baseline.model, setsize.model)

To test for main effect of condition

condition.model <- lmer(Estimate ~ Size + Condition + (1|Subject), Mydata)
anova(setsize.model, condition.model)

To test for interaction

interaction.model <- lmer(Estimate ~ Size*Condition + (1|Subject), Mydata)
anova(condition.model, interaction.model)

My second set of questions are: Are these models appropriately fitted? Should I be considering nested models, e.g., Size and Condition within Subject? Should I also consider random slopes as it seems reasonable to suppose that the effects of Size and/or Condition, or even Size x Condition may vary across Subjects. I thought this may be critical if I trial-level data tend to be correlated within subjects. If so, is this the correct way to fit it?

interaction.model <- lmer(Estimate ~ Size*Condition + (1 + Condition + Size + Condition*Size|Subject), Mydata)

My final question is: With an expected Size x Condition interaction, how can I perform post-hoc tests to examine if there is an effect of Condition for each level of Size (I'm hypothesizing Condition effects for larger set sizes, but not for small set sizes)? I've tried the following, but the parameter estimates are identical for the 2 levels of Conditions for each level of Size, so I was wondering if the model was fitted wrongly in the first place:

lsmeans(interaction.model, ~ Condition | Size, adjust="tukey")

I would greatly appreciate any advice! If there is anything that I should clarify, please let me know, and I will provide as much details as I can.

Thanks!

Best Answer

A linear mixed model is what you want. First, make sure that Subject is a factor:

Mydata$Subject <- as.factor(Mydata$Subject)

Then, I would fit the model with saturated fixed- and random-effects structures:

mod1 <- lmer(Estimate ~ Condition + Size +  Condition * Size + # Fixed effects
               (1 + Condition + Size | Subject), # Random effects, nested within subject
             data=Mydata, REML=TRUE) # Specifying data and estimation

I know part of the formula is redundant, but I wanted to make it clear as possible to anyone reading in the future.

Note that the model with the fixed effects structure of (1 + Condition + Size + Condition*Size | Subject) is not identified and will not converge.

You want Condition specified as a random effect because this allows the variance at different levels of condition to be different, and Size lets the slope of Size be different across people.

Then I would do a top-down testing procedure. I would then set up identical models, but take out the random slopes one-by-one:

mod2 <- lmer(Estimate ~ Condition + Size +  Condition * Size + # Fixed effects
               (1 + Size | Subject), # Random effects, nested within subject
             data=Mydata, REML=TRUE) # Specifying data and estimation

mod3 <- lmer(Estimate ~ Condition + Size +  Condition * Size + # Fixed effects
               (1 + Condition | Subject), # Random effects, nested within subject
             data=Mydata, REML=TRUE) # Specifying data and estimation

anova(mod1,mod2,refit=FALSE) # tests for significance of condition random effecct
anova(mod1,mod3,refit=FALSE) # tests for significance of size random effecct

Note that you are estimating a random slope and covariance between slope and intercept at the same time with those two anova tests, so you will want to adjust your p-values accordingly.

If removing the slopes yields a significant difference, leave them in; if the fit is just as good when you take them out, then you can leave them out.

Then you can do the same with fixed effects, but you can specify REML=FALSE for those and there is no need to adjust the p-value. You can also use the t-tests from lmerTest; they generally give the same results.

It is unclear as to if Size is continuous or not. You say that it is, but then you talk about there being different "levels" of it and wanting to do post-hoc tests to compare it at different levels. If you are treating it as linear and continuous, then none of that matters; if you want to do a post-hoc test for a continuous moderator, you could look at simple slope analyses.