I see that there is a multiple-membership tag, but I can't find a good explanation of what a multiple membership model is, or how to go about fitting one.
In my limited understanding, it seem very similar to a cross-classified model. That is, units in one level don't "belong" to a single level in another level – they can belong to many. So, in a healthcare setting, a patient might be treated in one hospital for one condition, and in another hospital for another condition, so patients are not nested in hospitals – they seem crossed. Is this multiple membership ? If so how is it different from cross-classified models. I know that cross classified models are very common in the mixed modelling world, so I assume it is the same with multiple membership, although I do not see much about multiple membership in the mixed models literature.
Are multiple membership models the same as cross classified models ? In this answer, it is stated:
"the latter is a crossed design (some might also call it multiple membership)"
This leads me to think that they are the same, although it is somewhat ambiguous.
If not, then what are they are how do we fit them ?
Best Answer
Note this has been edited to address the issue of how to construct the model matrix for the random effects.
I agree that this can be confusing. But before answering, I would just like to be a bit pedantic and mention that multiple membership (and nesting, and crossing) is not a property of the model. It is a property of the experimental/study design, which is then reflected in the data, which is then encapsulated by the model.
No they are not. The reason why my answer that you linked to is ambiguous on this is because some people, erroneously in my opinion, use the two terms interchangeably in certain situations (more on this below), when in fact they are quite different (in my opinion).
The example you mentioned, patients in hospitals, is a very good one. The key here is to think about the lowest level of measurement, and where the repeated measures occur. If patients are the lowest level of measurement (that is, there are no repeated measures within patients), then
patient
will not be a grouping variable; that is, we would not fit random intercepts for it, so by definition there cannot be crossed random effects involvingpatient
. On the other hand, if there are repeated measures within patients then we would fit random intercepts for patients, and therefore we would have crossed random effects for patient and hospital. In the former case we would call this a model with multiple membership, but in the latter case we would call it a model with crossed random effects (in reality it will probably be partially nested and partially crossed). Some people seem to consider both to be multiple membership, and the latter to be just a special case (hence my ambiguous statement in the linked answer). I just think this confuses the situation.So to give a definition of multiple membership, I would say this occurs when the lowest level units "belong" to more than one upper-level unit. So, following the same example,
where there are no repeated measures within patients, the lowest level unit is patient; if a patient is treated in more than one hospital we have multiple membership
if there are repeated measures within patients, then the lowest level unit is the measurement occasion, which is nested within patients, and patients are (probably partially) crossed with hospitals.
In the multilevel modelling world, software such as MLwiN can fit multiple membership models "out of the box". With mixed effects models, things are not straightforward, at least with the packages I am familiar with. The problem is that the data will look something like this:
Other representations of the data are obviously possible but I think this makes most sense, and makes what follows easier to understand. Edit: It also makes the construction of the model matrix for the random effects quite straightforward (see the edit below).
Clearly it does not make any sense to fit random intercepts for each hospital. However, we have repeated measures within hospitals, so we need to account for this somehow, since observations within hospitals are more likely to be similar to each other than to observations in other hospitals. Moreover, not only is there likely to be correlations within hospitals, but each hospital that a patient belongs to contributes to the (single) measured outcome for that patient.
I don't know if there is an agreed upon way to handle this with mixed models, but Doug Bates and Ben Bolker have both shown how it can be done in
lme4
:https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q2/006318.html
https://rstudio-pubs-static.s3.amazonaws.com/442445_4a48ad854b3e45168708cfe4f007d544.html
I won't mention the specifics of how to do it in
lme4
, but the idea is toHospitalID
with levelsA
-H
using the above example).lme4
) allows the model to be constructed internally without actually fitting it. We don't need it to be fitted - only to create the model matrix.HospA
-HospH
columns of the above example.Edit: to address the question of how to construct the model matrix for the random effects
In a mixed model setting, we usually work with the general mixed model formula:
$$ y = X \beta + Zu + \epsilon$$
In the above example, we want to fit random intercepts for hospitals. The purpose of the model matrix $Z$ is to map the relevant random effects, $u$, onto the response. In the above example we have 8 hospitals. Therefore, the random effects (random intercepts) will be a vector of length 8. For simplicity let's say that it is:
$$ u = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \\ 7 \\ 8 \end{bmatrix} $$
Now, if we look at patient 1, they are in hospitals
A
,F
andH
. So that patient will get a contribution of 1 from from hospitalA
, 6 from hospitalF
and 8 from hospitalH
. We could alternatively write this as:$$ (1 \times 1) + (0 \times 2) +( 0 \times 3) + (0 \times 4) + (0 \times 5) + (1 \times 6) + (0 \times 7) + (1 \times 8) $$
We can now see that this is exactly the dot product of two vectors:
$$ \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \\ 7 \\ 8 \end{bmatrix} $$
We can now observe that the row-vector above is exactly the same as the row in the data for the hospitals:
Therefore each row of the model matrix is simply the corresponding row of the hospital "membership" indicators, and the full structure of $Zu$ for the above data is:
$$ Zu = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 0 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 1 \end{bmatrix}\begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \\ 7 \\ 8 \end{bmatrix} $$