In mixed models the treatment of factors as either fixed or random, particularly in conjunction with whether they are crossed, partially crossed or nested can lead to a lot of confusion. Also, there appears to be differences in terminology between what is meant by nesting in the anova/designed experiments world and mixed/multilevel models world.
I don't profess to know all the answers, and my answer won't be complete (and may produce further questions) but I will try to address some of the issues here:
Does it make sense for a fixed effect to be nested within a random one, or how to code repeated measures in R (aov and lmer)?
(the question title)
No, I don't believe this makes sense. When we are dealing with repeated measures, then usually whatever the thing is that the measures are repeated on will be random, let's just call it Subject
, and in lme4
we will want to include Subject
on the right side of one or more |
in the random part of the formula. If we have other random effects, then these are either crossed, partially crossed or nested - and my answer to this question addresses that.
The issue with these anova-type designed experiments seems to be how to deal with factors that would normally be thought of as fixed, in a repeated measures situation, and the questions in the body of the OP speak to this:
Why Error(subject/A) and not Error(subject)?
I do not usually use aov()
so I could be missing something but, for me the Error(subject/A)
is very misleading in the case of the linked question. Error(subject)
in fact leads to exactly the same results.
Is it (1|subject) or (1|subject)+(1|A:subject) or simply (1|A:subject)?
This relates to this question. In this case, all the following random effects formulations lead to exactly the same result:
(1|subject)
(1|A:subject)
(1|subject) + (1|A:subject)
(1|subject) + (1|A:subject) + (1|B:subject)
However, this is because the simulated dataset in the question has no variation within anything, it is just created with Y = rnorm(48)
. If we take a real dataset such as the cake
dataset in the lme4
, we find that this will not generally be the case. From the documentation, here is the experimental setup:
Data on the breakage angle of chocolate cakes made with three different recipes and baked at six different temperatures. This is a split-plot design with the recipes being whole-units and the different temperatures being applied to sub-units (within replicates). The experimental notes suggest that
the replicate numbering represents temporal ordering.
A data frame with 270 observations on the following 5 variables.
replicate
a factor with levels 1 to 15
recipe
a factor with levels A, B and C
temperature
an ordered factor with levels 175 < 185 < 195 < 205 < 215 < 225
temp
numeric value of the baking temperature (degrees F).
angle
a numeric vector giving the angle at which the cake broke.
So, we have repeated measures within replicate
, and we are also interested in the fixed factors recipe
and temperature
(we can ignore temp
since this is just a different coding of temperature
), and we can visualise the situation using xtabs
:
> xtabs(~recipe+replicate,data=cake)
replicate
recipe 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
B 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
C 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
If recipe
were a random effect we would say that these are crossed random effects. In no way does recipe A
belong to replicate 1
or any other replicate.
> xtabs(~temp+replicate,data=cake)
replicate
temp 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
175 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
185 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
195 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
205 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
215 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
225 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Similarly for temp
.
So the first model we might fit is:
> lmm1 <- lmer(angle ~ recipe * temperature + (1|replicate), cake, REML= FALSE)
This will treat each replicate
as the only source of random variation (other than the residual of course). But there could be random differences between recipes. So we might be tempted to include recipe
as another (crossed) random effect but that would be ill-advised because we have only 3 levels of recipe
so we can't expect the model to estimate the variance components well. So instead we can use replicate:recipe
as the grouping variable which will enable us to treat each combination of replicate and recipe as a separate grouping factor. So whereas with the above model we would have 15 random intercepts for the levels of replicate
we will now have 45 random intercepts for each of the separate combinations:
lmm3 <- lmer(angle ~ recipe * temperature + (1|replicate:recipe) , cake, REML= FALSE)
Note that we now have (very slightly) different results indicating that there is some random variability due to recipe, but not a great deal.
We could likewise do the same thing with temperature
.
Now, going back to your question, you also ask
Why (1|subject) + (1|A:subject)
and not (1|subject) + (0+A|subject)
or even simply (A|subject)
?
I'm not entire sure where this (using random slopes) comes from - it doesn't seem to arise in the 2 linked questions - but my problem with (1|subject) + (1|A:subject)
is that this is exactly the same as (1|subject/A)
which means that A
is nested within subject
, which in turn means (to me) that each level of A
occurs in 1 and only 1 level of subject
which clearly is not the case here.
I will probably add to and/or edit this answer after I've thought about it some more, but I wanted to get my initial thoughts down.
It appears that you have a case of a partially crossed, partially nested design, because if I understand correctly, day
and cond
are crossed (ie neither are nested in the other), while both appear to be nested within subject
. measurement
is an id variable that indexes the measurement occasion on each day and within each condition, and as such should not be treated as a random factor because there is only one observation of the dependent variable for each measurement occasion. Even though they are indexed as 1-4 for each day/condition, they are different measurements (that is, measurement 1 for day 1 condition 0 and measurement 1 for day 1 condition 1 are not the same measurement) and therefore there can be no random variation in it. If you specified it as random in the way you have coded the data above, it would be a mistake.
If this is the case, then lme
is unable to fit such a model, and you could use something like lme4
instead. You could specify the structure in lme4
as follows:
DV ~ 1 + (1|subject) + (1|day) + (1|cond) + (1|subject:day) + (1|subject:cond)
If measurement
is a measurement of time within each day
or cond
and you expect some temporal effect, then you could include measurement
as a fixed effect (and also potentially fit random slopes, if the data supported such a model)
However, fitting a model with random intercepts for day
and cond
would not be a good idea because you have only 2 of each, so you would be asking the software to estimate a variance for a normally distributed variable having only 2 observations, which does not make any sense. So a better way forward is to treat day
and cond
as fixed effects, and simply fit random intercepts for subject
:
DV ~ day + cond + (1|subject)
The fact that day
and cond
were randomly assigned is not relevant.
The same comment as above applies for measurement
again here. That is, you might want to fit
DV ~ day + cond + measurement + (1|subject)
and again, you could also have random slopes for day
and/or cond
and/or measurement
if suggested by the domain theory and supported by the data.
Of course, now that we have discarded day
and cond
as random, you can go back to the nlme
package if you wish (athough lme4
is really the successor to nlme
for most cases)
Best Answer
Note I have changed
patient
tosubject
in your question, to be consistent with the data excerpt. I hope that is correct.For your first question:
The problem with this is that you are specifying a random effect (in particular, random intercepts) for the
roi:subroi
interaction. However, this has only 4 levels, which is insufficient, at least in a frequentist framework such aslmer
, as it's variance will be estimated poorly and numerical problems may also occur. Furthermore, this interaction is not specified as a fixed effect (which would not be possible anyway in the presence ofroi
as a fixed effect, due to the separation of levels betweenroi
andsubroi
) so it would force the interaction to have an overall mean of zero (since the random effects have a mean of zero). So it would be better to remove(1|roi:subroi)
.Neither model makes sense to me. For one thing, you are losing power by splitting the dataset. For another, with
lmer(activity ~ cond + (1|subject), data=dat)
you already have an estimate for the fixed effect ofcond
along with it's interaction withroi
from your original model so this won't tell you anything new. Forlmer(activity ~ cond*roi + (1|subject), data=dat)
since there is now only 1 level ofroi
this will generate an error along the lines ofcontrasts can be applied only to factors with 2 or more levels
and if you removeroi
from the that model then my previous comment also applies. I wonder if you perhaps meantlmer(activity ~ cond*subroi + (1|subject), data=dat)
which would make more sense.