Random Effects – Identifying Nested or Crossed in Mixed Models

categorical datacrossed-random-effectsmixed modelnested datar

I am using the R package lme4 to build a mixed-effects model. My data is set up in the following way:

set.seed(1)
df = data.frame(group1 = factor(c(rep(1,36),rep(2,36),rep(3,36),rep(4,36),rep(5,36))),
                group2 = factor(rep(c(rep(1,12),rep(2,12),rep(3,12)),5)),
                uniqueid = seq(from=1,to=180),
                value = rnorm(n = 180, mean = 10, sd = 2))

I have two grouping variables, group1 and group2. group1 consists of 5 different categories, and group2 consists of 3 different categories. This creates a total of 15 unique combinations of group1 and group2, and 12 unique observations within each unique combination, like so:

#xtabs(~group1+group2,df)

      group2
group1  1  2  3
     1 12 12 12
     2 12 12 12
     3 12 12 12
     4 12 12 12
     5 12 12 12

My goal is to build a mixed-effects model to get the fixed-effects parameters of being included in group1 and group2, as the 12 samples within each unique combination are not independent.

Intuitively, I thought to build a model like so:

lme4::lmer(data=df,
           formula=value ~ group1 + group2 + (1|group1) + (1|group1:group2))

where group2 is nested within group1, however, there is nothing inherent about the data structure that suggests it could not also be:

lme4::lmer(data=df,
           formula=value ~ group1 + group2 + (1|group2) + (1|group2:group1))

where group1 is nested within group2. This leads me to believe that I am actually dealing with crossed effects, where the proper model would be built like so:

lme4::lmer(data=df,
           formula=value ~ group1 + group2 + (1|group1) + (1|group2))

Other reasons for believing I am dealing with crossed effects are that my group2 categories exist within all levels of the group1 categories, and vice versa. There are not "unique" group2 categories that only exist within certain categories of group1, although the observations within combination are unique.

EDIT (an analogy): An analogous situation would be if there were 5 unique race categories, say, "White", "Black", "American Indian", "Asian/Pacific Islander", and "Other," along with 3 unique ethnicity categories say, "Hispanic", "Non-Hispanic", and "Other." This would allow for 15 unique combinations. Within each unique combination, there are 12 samples that are dependent within the unique combinations, for a total of 180 samples. This way, there are a total of 60 individuals that are "Hispanic", broken up such that 12 fall into each race category.

The confusion I am running into is related to the answer on a related post: Crossed vs nested random effects: how do they differ and how are they specified correctly in lme4?
where it uses the image:

enter image description here

to describe a crossed random-effect scenario. Here, it seems that the observations within each "Class" are being shared by the "Schools", but this situation differs from my scenario. In my scenario (using the race/ethnicity example), there are unique observations for each combination of race and ethnicity (60 observations for each ethnicity, split evenly into 12 observations for each race.)

Best Answer

I am going to only focus on the question: Is it appropriate to use nested or crossed design between group1 and group2 in random model?

The short answer is to use crossed design.

I think the answer becomes really clear if we understand what is the difference between crossed and nested design. I am going to direct you to these two posts in stats.stackexchange that clearly explain the differences: post 1 and post 2.

Essentially, nested design, as name suggested, means one variable is designed to be within another variable. For example, students are intrinsically nested within school. On the other hand, crossed design means there are no directional relationship. For example, for two variables weights and heights, we cannot specify either one of them is nested within the other. Therefore, we use crossed design.

Back to your scenario, you already made it clear that it is a crossed design. Especially in this part:

however, there is nothing inherent about the data structure that suggests it could not also be: lme4::lmer(data=df, formula=value ~ group1 + group2 + (1|group2) + (1|group2:group1)

This suggests there is no directional relationship between group 1 and group 2 as it works both ways.

Therefore, the correct random model is (1|group1) + (1|group2)

Here, it seems that the observations within each "Class" are being shared by the "Schools", but this situation differs from my scenario.

It is similar, except the arrows in the figure go both ways in your analogy and then change "Schools" to ethnicity and "Class" to race.

In my scenario (using the race/ethnicity example), there are unique observations for each combination of race and ethnicity (60 observations for each ethnicity, split evenly into 12 observations for each race.)

This is true but remember the it is also true to state "36 observations for each race, split evenly into 12 observations for each ethnicity." This is exactly referring back to "the arrows go both ways".

Related Question