Solved – split-split plot design with unbalanced repeated measures in lme4 or nlme (SAS translation)

lme4-nlmemixed modelrepeated measures

I am sorry if this answer has been answered before but most answers here (e.g. here, or here ) do not really adress my issue (or maybe I just do not see correctly how they do.
I want to use a (linear) mixed effects analysis on my data. My design is quite straightforward:

  • I have two kinds of bacteria (M1 and M2) that are either paired with 8 other bacteria (S1) or not (S2).
  • In case of S1 quadruplicates bottles were measured and in case of S2 just triplicates.
  • The output variable is measured at multiple timepoints (repeated measures) for approx 72h.
  • Then a defined amount of biomass is taken to inoculate new bottles and so on for a total of 4 cycles (SB1 to 4). Hence the measurements of consecutive cycles are not independent (corelation).
  • The amount of timepoints per cycle is not always the same (unbalanced)

Ok, maybe not that straightforward, let's clarify with a picture:
Design
According to one of my colleagues with a lot of experience in mixed models this is a split-split plot design with repeated measures. Sadly, she is a SAS user and I would love to use R (lme or lmer) for the analysis.
However, I'm not here for the translation (that would be something for stack overflow I suppose, although if anyone here could answer it would be nice) but rather for understanding whether my design is indeed split-split with unbalanced repeated measures and how I can construct an adequate model to analyze it in terms of nested, fixed and random effects.

Here is wat I have so far with some dummy data (I try to use split-split plot nomenclature):

output <- rnorm(266)
mainplot <- c(rep("M1",133),rep("M2",133))
subplot <- c(rep("S1",76),rep("S2",57),
             rep("S1",76),rep("S2",57))
subsubplot <-c(rep("SB1",24),rep("SB2",16),rep("SB3",20),rep("SB4",16),
               rep("SB1",18),rep("SB2",12),rep("SB3",15),rep("SB4",12),
               rep("SB1",24),rep("SB2",16),rep("SB3",20),rep("SB4",16),
               rep("SB1",18),rep("SB2",12),rep("SB3",15),rep("SB4",12))
time<-rnorm(266)

mockset <- data.frame(output,mainplot,subplot,subsubplot,time)

mock.lmefit1 <- lme(output ~ 0 + mainplot * subplot * subsubplot, 
               data = mockset,
               random = ~ time|mainplot/subplot/subsubplot)

The mockset really represents my "unbalanced repeated measures" data structure. In the model above I do not account for my "autocorrelation" between subsubplots (i.e. Cycles SB1 through 4) nor do I correct explicitely for the "unbalancedness" of my repeated measures. I am not interested in my intercept.

My colleague suggested to use the following SAS code:

proc mixed data=rdata;
class mainplot subplot subsubplot time;
model output=  mainplot subplot
               subsubplot subplot*subsubplot
               time subplot*time
               subplot*subsubplot*time/ddfm=KENWARDROGER;
random mainplot*subplot mainplot*subplot*subsubplot;
lsmeans subsubplot subplot subplot*subsubplot
        time subplot*time subsubplot*time
        subplot*subsubplot*time;
run;

If I got it right the * in SAS is analogous to the R :, i.e. it shows the interaction between terms.
To me it is not exactly clear how to port this model from SAS to R. I do also not really get why specific effects are fixed here and others are random. One more thing that I do not have an option to give in into R is the ddfm (denominator degrees of freedom for the tests of fixed effects) option of SAS. What is the importance of this? Does anyone know how lme deals with this?

mock.lmefitSAS <- lme(output ~ mainplot + subplot +
                        subsubplot + subplot:subsubplot +
                        time + subplot:time + 
                        subsubplot:time +
                    subplot:subsubplot:time,
                data = mockset,
                random = ~ 1|mainplot:subplot + mainplot:subplot:subsubplot)

This code does not work, because this is not the proper way to give in interaction terms in lme I guess (? error is "invalid formula for groups") but as stated before I'm not here for a SAS-to-R translation…
I would just like to know if and why the SAS model is more correct given my design and what my R implementation is lacking.

Best Answer

The one practical thing I can tell you is that the denominator degrees-of-freedom business is available for lme4 models, using the pbkrtest package or various wrappers for it: see the ?pvalues man page from recent versions of lme4.

library("lme4")
options(contrasts=c("contr.sum","contr.poly"))
m1 <- lmer(output~0+mainplot*subplot*time +
 (time|mainplot:subplot:subsubplot),
           data=mockset)
library("car")
Anova(m1,type="3",test="F")

For the rest, I pretty much just have to agree with you. I'm a bit surprised by your colleague's specification -- I independently reconstructed your R-style syntax before I saw yours, and it doesn't make much sense to me to treat terms involving subsubplot (bottle) as fixed, or to treat mainplot*subplot as random (for both philosophical and practical, i.e. insufficient-replication, reasons). Perhaps she could chime in?