Solved – Using lmer for repeated-measures linear mixed-effect model

anovalme4-nlmemixed modelrrepeated measures

EDIT 2: I originally thought I needed to run a two-factor ANOVA with repeated measures on one factor, but I now think a linear mixed-effect model will work better for my data. I think I nearly know what needs to happen, but am still confused by few points.

The experiments I need to analyze look like this:

  • Subjects were assigned to one of several treatment groups
  • Measurements of each subject were taken on multiple days
  • So:
    • Subject is nested within treatment
    • Treatment is crossed with day

(each subject is assigned to only one treatment, and measurements are taken on each subject on each day)

My dataset contains the following information:

  • Subject = blocking factor (random factor)
  • Day = within subject or repeated measures factor (fixed factor)
  • Treatment = between subject factor (fixed factor)
  • Obs = measured (dependent) variable

UPDATE
OK, so I went and talked to a statistician, but he's an SAS user. He thinks that the model should be:

Treatment + Day + Subject(Treatment) + Day*Subject(Treatment)

Obviously his notation is different from the R syntax, but this model is supposed to account for:

  • Treatment (fixed)
  • Day (fixed)
  • the Treatment*Day interaction
  • Subject nested within Treatment (random)
  • Day crossed with "Subject within Treatment" (random)

So, is this the correct syntax to use?

m4 <- lmer(Obs~Treatment*Day + (1+Treatment/Subject) + (1+Day*Treatment/Subject), mydata)

I'm particularly concerned about whether the Day crossed with "Subject within Treatment" part is right. Is anyone familiar with SAS, or confident that they understand what's going on in his model, able to comment on whether my sad attempt at R syntax matches?

Here are my previous attempts at building a model and writing syntax (discussed in answers & comments):

m1 <- lmer(Obs ~ Treatment * Day + (1 | Subject), mydata)

How do I deal with the fact that subject is nested within treatment? How does m1 differ from:

m2 <- lmer(Obs ~ Treatment * Day + (Treatment|Subject), mydata)
m3 <- lmer(Obs ~ Treatment * Day + (Treatment:Subject), mydata)

and are m2 and m3 equivalent (and if not, why)?

Also, do I need to be using nlme instead of lme4 if I want to specify the correlation structure (like correlation = corAR1)? According to Repeated Measures, for a repeated-measures analysis with repeated measures on one factor, the covariance structure (the nature of the correlations between measurements of the same subject) is important.

When I was trying to do a repeated-measures ANOVA, I'd decided to use a Type II SS; is this still relevant, and if so, how do I go about specifying that?

Here's an example of what the data look like:

mydata <- data.frame(
  Subject  = c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 
               34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 
               19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 
               40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 
               29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65), 
  Day       = c(rep(c("Day1", "Day3", "Day6"), each=28)), 
  Treatment = c(rep(c("B", "A", "C", "B", "C", "A", "A", "B", "A", "C", "B", "C", 
                      "A", "A", "B", "A", "C", "B", "C", "A", "A"), each = 4)), 
  Obs       = c(6.472687, 7.017110, 6.200715, 6.613928, 6.829968, 7.387583, 7.367293, 
                8.018853, 7.527408, 6.746739, 7.296910, 6.983360, 6.816621, 6.571689, 
                5.911261, 6.954988, 7.624122, 7.669865, 7.676225, 7.263593, 7.704737, 
                7.328716, 7.295610, 5.964180, 6.880814, 6.926342, 6.926342, 7.562293, 
                6.677607, 7.023526, 6.441864, 7.020875, 7.478931, 7.495336, 7.427709, 
                7.633020, 7.382091, 7.359731, 7.285889, 7.496863, 6.632403, 6.171196, 
                6.306012, 7.253833, 7.594852, 6.915225, 7.220147, 7.298227, 7.573612, 
                7.366550, 7.560513, 7.289078, 7.287802, 7.155336, 7.394452, 7.465383, 
                6.976048, 7.222966, 6.584153, 7.013223, 7.569905, 7.459185, 7.504068, 
                7.801867, 7.598728, 7.475841, 7.511873, 7.518384, 6.618589, 5.854754, 
                6.125749, 6.962720, 7.540600, 7.379861, 7.344189, 7.362815, 7.805802, 
                7.764172, 7.789844, 7.616437, NA, NA, NA, NA))

Best Answer

I think that your approach is correct. Model m1 specifies a separate intercept for each subject. Model m2 adds a separate slope for each subject. Your slope is across days as subjects only participate in one treatment group. If you write model m2 as follows it's more obvious that you model a separate intercept and slope for each subject

m2 <- lmer(Obs ~ Treatment * Day + (1+Day|Subject), mydata)

This is equivalent to:

m2 <- lmer(Obs ~ Treatment + Day + Treatment:Day + (1+Day|Subject), mydata)

I.e. the main effects of treatment, day and the interaction between the two.

I think that you don't need to worry about nesting as long as you don't repeat subject ID's within treatment groups. Which model is correct, really depends on your research question. Is there reason to believe that subjects' slopes vary in addition to the treatment effect? You could run both models and compare them with anova(m1,m2) to see if the data supports either one.

I'm not sure what you want to express with model m3? The nesting syntax uses a /, e.g. (1|group/subgroup).

I don't think that you need to worry about autocorrelation with such a small number of time points.