I am sorry if this answer has been answered before but most answers here (e.g. here, or here ) do not really adress my issue (or maybe I just do not see correctly how they do.
I want to use a (linear) mixed effects analysis on my data. My design is quite straightforward:
- I have two kinds of bacteria (M1 and M2) that are either paired with 8 other bacteria (S1) or not (S2).
- In case of S1 quadruplicates bottles were measured and in case of S2 just triplicates.
- The output variable is measured at multiple timepoints (repeated measures) for approx 72h.
- Then a defined amount of biomass is taken to inoculate new bottles and so on for a total of 4 cycles (SB1 to 4). Hence the measurements of consecutive cycles are not independent (corelation).
- The amount of timepoints per cycle is not always the same (unbalanced)
Ok, maybe not that straightforward, let's clarify with a picture:
According to one of my colleagues with a lot of experience in mixed models this is a split-split plot design with repeated measures. Sadly, she is a SAS user and I would love to use R (lme or lmer) for the analysis.
However, I'm not here for the translation (that would be something for stack overflow I suppose, although if anyone here could answer it would be nice) but rather for understanding whether my design is indeed split-split with unbalanced repeated measures and how I can construct an adequate model to analyze it in terms of nested, fixed and random effects.
Here is wat I have so far with some dummy data (I try to use split-split plot nomenclature):
output <- rnorm(266)
mainplot <- c(rep("M1",133),rep("M2",133))
subplot <- c(rep("S1",76),rep("S2",57),
rep("S1",76),rep("S2",57))
subsubplot <-c(rep("SB1",24),rep("SB2",16),rep("SB3",20),rep("SB4",16),
rep("SB1",18),rep("SB2",12),rep("SB3",15),rep("SB4",12),
rep("SB1",24),rep("SB2",16),rep("SB3",20),rep("SB4",16),
rep("SB1",18),rep("SB2",12),rep("SB3",15),rep("SB4",12))
time<-rnorm(266)
mockset <- data.frame(output,mainplot,subplot,subsubplot,time)
mock.lmefit1 <- lme(output ~ 0 + mainplot * subplot * subsubplot,
data = mockset,
random = ~ time|mainplot/subplot/subsubplot)
The mockset really represents my "unbalanced repeated measures" data structure. In the model above I do not account for my "autocorrelation" between subsubplots (i.e. Cycles SB1 through 4) nor do I correct explicitely for the "unbalancedness" of my repeated measures. I am not interested in my intercept.
My colleague suggested to use the following SAS code:
proc mixed data=rdata;
class mainplot subplot subsubplot time;
model output= mainplot subplot
subsubplot subplot*subsubplot
time subplot*time
subplot*subsubplot*time/ddfm=KENWARDROGER;
random mainplot*subplot mainplot*subplot*subsubplot;
lsmeans subsubplot subplot subplot*subsubplot
time subplot*time subsubplot*time
subplot*subsubplot*time;
run;
If I got it right the *
in SAS is analogous to the R :
, i.e. it shows the interaction between terms.
To me it is not exactly clear how to port this model from SAS to R. I do also not really get why specific effects are fixed here and others are random. One more thing that I do not have an option to give in into R is the ddfm (denominator degrees of freedom for the tests of fixed effects) option of SAS. What is the importance of this? Does anyone know how lme deals with this?
mock.lmefitSAS <- lme(output ~ mainplot + subplot +
subsubplot + subplot:subsubplot +
time + subplot:time +
subsubplot:time +
subplot:subsubplot:time,
data = mockset,
random = ~ 1|mainplot:subplot + mainplot:subplot:subsubplot)
This code does not work, because this is not the proper way to give in interaction terms in lme I guess (? error is "invalid formula for groups") but as stated before I'm not here for a SAS-to-R translation…
I would just like to know if and why the SAS model is more correct given my design and what my R implementation is lacking.
Best Answer
The one practical thing I can tell you is that the denominator degrees-of-freedom business is available for
lme4
models, using thepbkrtest
package or various wrappers for it: see the ?pvalues man page from recent versions oflme4
.For the rest, I pretty much just have to agree with you. I'm a bit surprised by your colleague's specification -- I independently reconstructed your R-style syntax before I saw yours, and it doesn't make much sense to me to treat terms involving
subsubplot
(bottle) as fixed, or to treatmainplot*subplot
as random (for both philosophical and practical, i.e. insufficient-replication, reasons). Perhaps she could chime in?