This is a very interesting question. I have been thinking this and related things for a long time.
For me the key to understanding this is to realise that: Random intercepts for a grouping factor are not always sufficient to capture the random variation in the data that is in excess of the residual variation. Because of this, we sometimes see models with random intecepts for interactions between a fixed factor and a grouping variable (and even sometimes random intercepts just for a fixed factor). Generally we advise that a factor can be fixed or random (intercepts) but not both - however there are important exceptions of which your example here is one.
Another hindrence to understanding this is for people like me who come from a multilevel modelling / mixed model background in observational social and medical sciences, we are often caught up in thinking about repeated measures and nesting vs crossed random effects without an understanding that things are a bit different in experimental analysis. More on this a bit later.
Judging by the comments we have both discovered the same thing. In the context of a repeated measures ANOVA, if you want to obtain the same results with lmer
then you fit:
y ~ A + B + (1|id) + (1|id:A) + (1|id:B)
where I have discarded factor C
without loss of generality.
and the reason why some people specify 1|id/A/B
is that they are using nmle:lme
and not lme4:lmer
. I am not sure why this is needed in lme()
but I am fairly sure that to replicate a repeated measures anova - where there is variation for each combination of id
and the factors - then you fit the model above in lmer()
. Note that (1|id/A/B)
seems similar, however it is wrong because it would also fit (1|id:A:B)
which is indistinguishable from the residual variance (as noted in your comment also).
It is important to note (and therefore worth repeating) that we only fit this type of model where we have reason to believe that there is variation for each combination of id
and the factors. Typically with mixed models we would not do this. We need to understand experimental design. One type of experiment where this is common is the so-called split-plot design where blocking has also been used. This type of experimental design employs randomisation at different "levels" - or rather different combinations of factors, and this is why analysis of such experiemnts often includes random intercept terms that at first glance seem odd. However, the random structure is a property of the experimental design and without knowledge of this, it is virtually impossible to select the correct structure.
So, with regards to the your actual question where the experiment has a repeated factorial design, we can use our friend, simulation, to investigate further.
We will simulate data for the models:
y ~ A + B + (1|id)
and
y ~ A + B + (1|id) + (1|id:A) + (1|id:B)
and look at what happens when we use both models to analyse both datasets.
set.seed(15)
n <- 100 # number of subjects
K <- 4 # number of measurements per subject
# set up covariates
df <- data.frame(id = rep(seq_len(n), each = K),
A = rep(c(0,0,1,1), times = n),
B = rep(c(0,1), times = n * 2)
)
#
df$y <- df$A + 2 * df$B + 3 * df$id + rnorm(n * K)
m0 <- lmer(y ~ A + B + (1|id) , data = df)
m1 <- lmer(y ~ A + B + (1|id) + (1|id:A) + (1|id:B), data = df)
summary(m0)
m0
is the "right" model for these data because we simply created y
with fixed effects for id
(which we capture with random intercepts) and unit variance. This is a bit of an abuse but it is convenient and does what we want:
Groups Name Variance Std.Dev.
id (Intercept) 842.1869 29.0205
Residual 0.9946 0.9973
Number of obs: 400, groups: id, 100
Fixed effects:
Estimate Std. Error t value
(Intercept) 50.47508 2.90333 17.39
A 1.01277 0.09973 10.15
B 2.06675 0.09973 20.72
as we can see, we recover unit variance in the residual and good estimates for the fixed effects. However:
> summary(m1)
Random effects:
Groups Name Variance Std.Dev.
id:B (Intercept) 0.000e+00 0.0000
id:A (Intercept) 8.724e-03 0.0934
id (Intercept) 8.422e+02 29.0204
Residual 9.888e-01 0.9944
Number of obs: 400, groups: id:B, 200; id:A, 200; id, 100
Fixed effects:
Estimate Std. Error t value
(Intercept) 50.47508 2.90334 17.39
A 1.01277 0.10031 10.10
B 2.06675 0.09944 20.78
This is a singular fit - zero estimates for the variance of the id:B
term and close to zero for id:A
- which we happen to know is correct here because we didn't simulate any variance for those "levels". Also we find:
> anova(m0, m1)
m0: y ~ A + B + (1 | id)
m1: y ~ A + B + (1 | id) + (1 | id:A) + (1 | id:B)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
m0 5 1952.8 1972.7 -971.39 1942.8
m1 7 1956.8 1984.7 -971.39 1942.8 0.0052 2 0.9974
meaning that we strongly prefer the (correct) model m0
So now we simulate data with variation at these "levels" too. Since m1
is the model we want to simlulate for, we an use it's design matrix for the random effects:
# design matrix for the random effects
Z <- as.matrix(getME(m1, "Z"))
# design matrix for the fixed effects
X <- model.matrix(~ A + B, data = df)
betas <- c(10, 2, 3) # fixed effects coefficients
D1 <- 1 # SD of random intercepts for id
D2 <- 2 # SD of random intercepts for id:A
D3 <- 3 # SD of random intercepts for id:B
# we simulate random effects
b <- c(rnorm(n*2, sd = D3), rnorm(n*2, sd = D2), rnorm(n, sd = D1))
# the order here is goverened by the order that lme4 creates the Z matrix
# linear predictor
lp <- X %*% betas + Z %*% b
# add residual variance of 1
df$y <- lp + rnorm(n * K)
m2 <- lmer(y ~ A + B + (1|id) + (1|id:A) + (1|id:B), data = df)
m3 <- lmer(y ~ A + B + (1|id), data = df)
summary(m2)
'm2` is the corect model here and we obtain:
Random effects:
Groups Name Variance Std.Dev.
id:B (Intercept) 6.9061 2.6279
id:A (Intercept) 4.4766 2.1158
id (Intercept) 2.9117 1.7064
Residual 0.8704 0.9329
Number of obs: 400, groups: id:B, 200; id:A, 200; id, 100
Fixed effects:
Estimate Std. Error t value
(Intercept) 10.3870 0.3866 26.867
A 1.8123 0.3134 5.782
B 3.0242 0.3832 7.892
the SD for the id
intercept is a little high, but otherwise we have good estimates for the random and fixed effects. On the other hand:
> summary(m3)
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 6.712 2.591
Residual 8.433 2.904
Number of obs: 400, groups: id, 100
Fixed effects:
Estimate Std. Error t value
(Intercept) 10.3870 0.3611 28.767
A 1.8123 0.2904 6.241
B 3.0242 0.2904 10.414
although the fixed effects point estimates are OK, their standard errors are larger. The random structure is, of course, completely wrong. And finally:
> anova(m2, m3)
Models:
m3: y ~ A + B + (1 | id)
m2: y ~ A + B + (1 | id) + (1 | id:A) + (1 | id:B)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
m3 5 2138.1 2158.1 -1064.07 2128.1
m2 7 1985.7 2013.7 -985.87 1971.7 156.4 2 < 2.2e-16 ***
showing that we strongly prefer m2
Note this has been edited to address the issue of how to construct the model matrix for the random effects.
I agree that this can be confusing. But before answering, I would just like to be a bit pedantic and mention that multiple membership (and nesting, and crossing) is not a property of the model. It is a property of the experimental/study design, which is then reflected in the data, which is then encapsulated by the model.
Are multiple membership models the same as cross classified models ?
No they are not. The reason why my answer that you linked to is ambiguous on this is because some people, erroneously in my opinion, use the two terms interchangeably in certain situations (more on this below), when in fact they are quite different (in my opinion).
The example you mentioned, patients in hospitals, is a very good one. The key here is to think about the lowest level of measurement, and where the repeated measures occur. If patients are the lowest level of measurement (that is, there are no repeated measures within patients), then patient
will not be a grouping variable; that is, we would not fit random intercepts for it, so by definition there cannot be crossed random effects involving patient
. On the other hand, if there are repeated measures within patients then we would fit random intercepts for patients, and therefore we would have crossed random effects for patient and hospital. In the former case we would call this a model with multiple membership, but in the latter case we would call it a model with crossed random effects (in reality it will probably be partially nested and partially crossed). Some people seem to consider both to be multiple membership, and the latter to be just a special case (hence my ambiguous statement in the linked answer). I just think this confuses the situation.
So to give a definition of multiple membership, I would say this occurs when the lowest level units "belong" to more than one upper-level unit. So, following the same example,
where there are no repeated measures within patients, the lowest level unit is patient; if a patient is treated in more than one hospital we have multiple membership
if there are repeated measures within patients, then the lowest level unit is the measurement occasion, which is nested within patients, and patients are (probably partially) crossed with hospitals.
how do we fit them ?
In the multilevel modelling world, software such as MLwiN can fit multiple membership models "out of the box". With mixed effects models, things are not straightforward, at least with the packages I am familiar with. The problem is that the data will look something like this:
Y PatientID HospA HospB HospC HospD HospE HospF HospG HospH
0.1 1 1 0 0 0 0 1 0 1
0.5 2 0 1 0 0 0 1 0 0
2.3 3 0 0 1 0 0 1 0 0
0.7 4 1 0 0 0 0 0 1 0
1.0 5 0 1 0 0 0 1 0 1
3.2 6 0 0 0 0 0 1 0 0
2.1 7 0 0 0 0 0 0 1 0
2.6 8 0 0 0 0 1 0 0 1
Other representations of the data are obviously possible but I think this makes most sense, and makes what follows easier to understand. Edit: It also makes the construction of the model matrix for the random effects quite straightforward (see the edit below).
Clearly it does not make any sense to fit random intercepts for each hospital. However, we have repeated measures within hospitals, so we need to account for this somehow, since observations within hospitals are more likely to be similar to each other than to observations in other hospitals. Moreover, not only is there likely to be correlations within hospitals, but each hospital that a patient belongs to contributes to the (single) measured outcome for that patient.
I don't know if there is an agreed upon way to handle this with mixed models, but Doug Bates and Ben Bolker have both shown how it can be done in lme4
:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q2/006318.html
https://rstudio-pubs-static.s3.amazonaws.com/442445_4a48ad854b3e45168708cfe4f007d544.html
I won't mention the specifics of how to do it in lme4
, but the idea is to
- Create a dummy grouping variable (
HospitalID
with levels A
- H
using the above example).
- Fit a model with random intercepts for the dummy. Some software (e.g.,
lme4
) allows the model to be constructed internally without actually fitting it. We don't need it to be fitted - only to create the model matrix.
- Construct the correct model matrix for the random effects yourself. This will be based on the
HospA
- HospH
columns of the above example.
- Update the model with the correct model matrix.
- (Re)fit the updated model.
Edit: to address the question of how to construct the model matrix for the random effects
In a mixed model setting, we usually work with the general mixed model formula:
$$ y = X \beta + Zu + \epsilon$$
In the above example, we want to fit random intercepts for hospitals. The purpose of the model matrix $Z$ is to map the relevant random effects, $u$, onto the response. In the above example we have 8 hospitals. Therefore, the random effects (random intercepts) will be a vector of length 8. For simplicity let's say that it is:
$$
u = \begin{bmatrix}
1 \\
2 \\
3 \\
4 \\
5 \\
6 \\
7 \\
8
\end{bmatrix}
$$
Now, if we look at patient 1, they are in hospitals A
, F
and H
. So that patient will get a contribution of 1 from from hospital A
, 6 from hospital F
and 8 from hospital H
. We could alternatively write this as:
$$ (1 \times 1) + (0 \times 2) +( 0 \times 3) + (0 \times 4) + (0 \times 5) + (1 \times 6) + (0 \times 7) + (1 \times 8) $$
We can now see that this is exactly the dot product of two vectors:
$$
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 1 & 0 & 1
\end{bmatrix} \begin{bmatrix}
1 \\
2 \\
3 \\
4 \\
5 \\
6 \\
7 \\
8
\end{bmatrix}
$$
We can now observe that the row-vector above is exactly the same as the row in the data for the hospitals:
Y PatientID HospA HospB HospC HospD HospE HospF HospG HospH
0.1 1 1 0 0 0 0 1 0 1
Therefore each row of the model matrix is simply the corresponding row of the hospital "membership" indicators, and the full structure of $Zu$ for the above data is:
$$
Zu = \begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 1 & 0 & 1 \\
0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 \\
1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 & 0 & 1 & 0 & 1 \\
0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 1
\end{bmatrix}\begin{bmatrix}
1 \\
2 \\
3 \\
4 \\
5 \\
6 \\
7 \\
8
\end{bmatrix}
$$
Best Answer
Specifying random effect terms in gamm4 is different to mgcv. The syntax I show is provided in this book.
Two random effect terms in gamm4 is:
If they are nested, it is: