Mixed Models – Nested Crossed Random Effects for Repeated Measures Data in R

lme4-nlmemixed modelmultilevel-analysisr

Problem

There are two excellent CV posts on specifying crossed effects models (post 1, post 2).

The issue I'm trying to wrestle with pertains to part of the answer to post 2, in particular how to nest crossed random effects.

In my study, I have:

  • About 20 individuals per site
  • About 10 sites
  • Within each site, there were about 20 samples

The outcome in the example is participant's "interest" (the study is about out-of-school programs).

Because there are dependencies by both participant and sample, I think there are two crossed random effects, one for observations associated with each individual, and one for observations associated with each sample. The hard part for me is that these random effects are nested in one of the 10 programs.

The samples were at the same time for all of the individuals within the site, but at different times at different sites, so that sample 1 in site A was not necessarily at the same time in any sense (not the same date / time nor at the same interval from the "start" of the site's activities). Therefore, to create the variable identifying the time of the sample, I combined the site variable, the date that the sample was collected, and another variable specifying whether the sample was the 1st, 2nd, 3rd, or 4th sample collected for that date. It's a factor.

The data (in R) are as follows:

# A tibble: 2,970 × 4
    interest participant_ID  site           sample
   <dbl+lbl>          <dbl> <chr>         <fctr>
1          2           1001     1 1-2015-07-14-1
2          2           1001     1 1-2015-07-14-2
3          4           1001     1 1-2015-07-15-1
4          3           1001     1 1-2015-07-15-2
5          3           1001     1 1-2015-07-21-1
6          1           1001     1 1-2015-07-21-2
7          3           1001     1 1-2015-07-21-4
8          3           1001     1 1-2015-07-22-1
9          4           1001     1 1-2015-07-22-4
10         3           1001     1 1-2015-07-28-1
# ... with 2,960 more rows

Possible Solution

In the answer to post 2, the author of the selected answer wrote:

Because you do not have unique values of the tow variable (i.e.
because as you say below tows are specified as 1, 2, 3 at every
station), you do need to specify the nesting, as (1|station:tow:day).
If you did have the tows specified uniquely, you could use either
(1|tow:day) or (1|station:tow:day) (they should give equivalent
answers).

In mapping this to my example, I do have unique values of the sample (tow variable), I do not need to specify the nesting. I'm having trouble specifying this model mathematically, and, thus, in terms of model syntax. (I am using lme4 in R).

But, here seem to be the options:

  1. Not nesting the crossed random effects within the site because the sample variable includes a site identifier:

    lmer(interest ~ 1 + (1|participant_ID) + (1|sample), data = df)

  2. Creating the sample variable without a site identifier but in a way so that samples within each site were still identified uniquely and nesting the crossed random effects within the site:

    lmer(interest ~ 1 + (1|site/participant_ID) + (1|site/sample), data = df)

Other examples interact the crossed random effects, via adding a term such as (1|participant_ID:sample).

Does either of these seem like they would account for dependencies by both participant and sample? Or, are there other options or better ways to model this?

Best Answer

Here's my read on what experiment was done and how I would model it.

Each observation has three identifiers and one value.

You stated that these identifiers are participant, site, and sample and the value is interest.

You explained to me that there are many levels of each of these identifiers and you expect measurements that have a common value of any one of them (same participant, same site or same sample) are likely to be more similar than observations that have none in common.

This sounds like a perfect situation for an LMM with random intercept for each of those factors. Thus, model I would fit would be:

lmer(formula = interest ~ (1|participant_ID) + (1|site) + (1|sample),
     data = df)

EDIT: deleted misunderstandings.