Mixed Model – Analyzing Mixed Models with Nested and Crossed Random Effects

crossed-random-effectslme4-nlmemixed modelnested dataregression

I'm new to mixed effects models and am trying to use the lmer() function from the lme4 R package to specify a random effects structure.

In my experiment, subjects are spread out over 11 non-overlapping groups. Groups were fairly big (hundreds of subjects) and were tested on at least 4 successive days. On each day, subject performance was measured under two experimental conditions (hungry, satiated) and during each of these conditions, each subject contributed zero to potentially many data points. Groups were tested sequentially, i.e., on each day only one group was tested.

After reading through lectures, tutorials, and posts on here, I think my model should look like this

response_time ~ experimental_condition + (1 | group_id/day) + (1 | subject_id)

and the subject_id needs to be unique across the entire data set.

Does this look okay? And under which (hypothetical) circumstances would one nest experimental_condition within day?

Thanks for any help!

Edit: I should have mentioned that day is currently coded as the day of the experiment. In other words, it is not unique across groups (i.e., there is a day 1, day 2, … for each group).

Best Answer

Does this look okay?

Based on your description, and given your research question of estimating the effect of experimental_condition, while accounting for the non-independence of observations due to the random structure your experiment has, this does not look OK to me. The issue is with the random structure, and how to handle the day variable.

It appears that each and every subject belongs to one and only one group. Thus, subjects are nested within groups, so you need the term:

... + (1 | group_id / subject_id) + ...

which will fit random intercepts for each group and each subject within a group.

This leaves the question of how to treat the day variable: fixed or random. There isn't necessarily a black and white answer to this, but see the list of threads at the end of my answer for help on how to choose. The first thing to note is that day has only 4 levels. This isn't necessarily a problem if day is nested within group_id, since there will then be $n_{day} \times n_{group} = 44$ intercepts.

So, if treating day as random and nested within group we would have:

response_time ~ experimental_condition + (1|group_id/subject_id) + (1|group_id/day)

which expands to

response_time ~ experimental_condition + (1|group_id) + (1|group_id:subject_id) + (1|group_id)+ (1|group_id:day)

which then simplifies to:

response_time ~ experimental_condition + (1|group_id) + (1|group_id:subject_id) + (1|group_id:day)

Alternatively if day is not nested within group we wouldn't fit random intercepts with only 4 levels, so treating day as fixed would make more sense in that scenario:

response_time ~ experimental_condition + day + (1|group_id/subject_id)

In the this latter model you should consider whether to fit an interaction term in the fixed part if the effect of the experimental condition differs by day:

response_time ~ experimental_condition * day + (1|group_id/subject_id)

And under which (hypothetical) circumstances would one nest experimental_condition within day?

Nesting experimental_condition within day makes sense if each experimental_condition belongs to one and only one day. That does not seem to be the case with your design. This would also bring up the problem of whether to fit a factor as random or variable. See the following threads for much discussion on that topic:

What is the difference between fixed effect, random effect and mixed effect models?

How to determine random effects in mixed model

Understanding Random Effects in Linear Mixed Models

Can a variable be included in a mixed model as a fixed effect and as a random effect at the same time?

Choosing Random Effects to Include in a Linear Mixed Model

Related Solutions

Mixed Model – Can Random Effects Be Nested Within a Factor with Only Two Observations in Linear Mixed Models?

There seems to be some confusion.

My understanding is that because there's only 2 participants in each family, I can't specify random slopes because there are insufficient degrees of freedom

This does not make sense to me. In general, random slopes do not make sense when the variable in question does not vary with subjects. So if you have repeated measures within levels of a grouping variable, then you can, in principle, fit random slopes (provided that they are supported by the data)

On the other hand, depending on what level you are taking your measurements / observations, the model may be mispecified. If you are measuring variables at the family level - eg father's ethnicity or mother's education level; or if you are making repeated measures at the family level - eg annual household income over several years, or family address (which may change over time), then the proposed model should be a good place to start. However if you are making repeated measures per twin, then you will need to fit random intercepts for twin ID, varying within family:

lmer(x ~ y + (1|familyID/twinID), dat)

Regarding the issue of nesting:

I'm wondering if this has any impact on whether you can nest other random effects within the family random effect.

There is no reason why you can't have further random effects nested within family. For example, as mentioned above, if you have repeated measures within individual twins then you would fit nested random effects. Although there are only 2 twins in each family, when fitting nested random effects it is the number of levels of the upper level factor that is important, since:

x ~ y + (1|familyID/twinID)

is exactly the same ae

x ~ y + (1|familyID) + (1 |familyID:twinID)

There will always be more levels of familyID:twinID than just familyID, so the constraint in terms of group sizes is familyID, not twinID

Mixed Effects Models – Analyzing Longitudinal Data with Nested Random Effects

Assuming that each subject is only in one group, you have a nested design. Conceptually, it makes more sense to treat group as a fixed effect. As you have only three groups, it wouldn't make much sense statistically speaking anyways because you'd be asking the software to estimate a variance for group assuming a normal distribution from only three observations.

Your sample size is quite low which might be a problem. Nevertheless, I suggest starting with a simple random intercept model:

lmer(outcome~day + group + (1|subject), data = dat)

This model fits a global intercept which is simply the intercept for the reference group, deviations from that intercept for the remaining groups, a single slope for the effect of day and a random intercept for subject. Hence, this model assumes that each group has the same trajectory over time, the same slope but different intercepts. To allow each group their own slope, you could fit the following model with an interaction between day and group:

lmer(outcome~day*group + (1|subject), data = dat)

Lastly, you could also allow for random slopes:

lmer(outcome~day*group + (day|subject), data = dat)

All these models assume a linear relationship between the outcome and day which might be unrealistic. If you suspect nonlinear relationships, you could easily accomodate those by using polynomials or (my recommendation) restricted cubic splines (aka natural splines) with, say, 3 knots. Finally, have a look at this post which goes into more details about such longitudinal models.

Best Answer

Related Solutions

Mixed Model – Can Random Effects Be Nested Within a Factor with Only Two Observations in Linear Mixed Models?

Mixed Effects Models – Analyzing Longitudinal Data with Nested Random Effects

Related Question