I am using the R package lme4
to build a mixed-effects model. My data is set up in the following way:
set.seed(1)
df = data.frame(group1 = factor(c(rep(1,36),rep(2,36),rep(3,36),rep(4,36),rep(5,36))),
group2 = factor(rep(c(rep(1,12),rep(2,12),rep(3,12)),5)),
uniqueid = seq(from=1,to=180),
value = rnorm(n = 180, mean = 10, sd = 2))
I have two grouping variables, group1
and group2
. group1
consists of 5 different categories, and group2
consists of 3 different categories. This creates a total of 15 unique combinations of group1
and group2
, and 12 unique observations within each unique combination, like so:
#xtabs(~group1+group2,df)
group2
group1 1 2 3
1 12 12 12
2 12 12 12
3 12 12 12
4 12 12 12
5 12 12 12
My goal is to build a mixed-effects model to get the fixed-effects parameters of being included in group1
and group2
, as the 12 samples within each unique combination are not independent.
Intuitively, I thought to build a model like so:
lme4::lmer(data=df,
formula=value ~ group1 + group2 + (1|group1) + (1|group1:group2))
where group2
is nested within group1
, however, there is nothing inherent about the data structure that suggests it could not also be:
lme4::lmer(data=df,
formula=value ~ group1 + group2 + (1|group2) + (1|group2:group1))
where group1
is nested within group2
. This leads me to believe that I am actually dealing with crossed effects, where the proper model would be built like so:
lme4::lmer(data=df,
formula=value ~ group1 + group2 + (1|group1) + (1|group2))
Other reasons for believing I am dealing with crossed effects are that my group2
categories exist within all levels of the group1
categories, and vice versa. There are not "unique" group2
categories that only exist within certain categories of group1
, although the observations within combination are unique.
EDIT (an analogy): An analogous situation would be if there were 5 unique race categories, say, "White", "Black", "American Indian", "Asian/Pacific Islander", and "Other," along with 3 unique ethnicity categories say, "Hispanic", "Non-Hispanic", and "Other." This would allow for 15 unique combinations. Within each unique combination, there are 12 samples that are dependent within the unique combinations, for a total of 180 samples. This way, there are a total of 60 individuals that are "Hispanic", broken up such that 12 fall into each race category.
The confusion I am running into is related to the answer on a related post: Crossed vs nested random effects: how do they differ and how are they specified correctly in lme4?
where it uses the image:
to describe a crossed random-effect scenario. Here, it seems that the observations within each "Class" are being shared by the "Schools", but this situation differs from my scenario. In my scenario (using the race/ethnicity example), there are unique observations for each combination of race and ethnicity (60 observations for each ethnicity, split evenly into 12 observations for each race.)
Best Answer
I am going to only focus on the question: Is it appropriate to use nested or crossed design between group1 and group2 in random model?
The short answer is to use crossed design.
I think the answer becomes really clear if we understand what is the difference between crossed and nested design. I am going to direct you to these two posts in stats.stackexchange that clearly explain the differences: post 1 and post 2.
Essentially, nested design, as name suggested, means one variable is designed to be within another variable. For example, students are intrinsically nested within school. On the other hand, crossed design means there are no directional relationship. For example, for two variables weights and heights, we cannot specify either one of them is nested within the other. Therefore, we use crossed design.
Back to your scenario, you already made it clear that it is a crossed design. Especially in this part:
This suggests there is no directional relationship between group 1 and group 2 as it works both ways.
Therefore, the correct random model is (1|group1) + (1|group2)
It is similar, except the arrows in the figure go both ways in your analogy and then change "Schools" to ethnicity and "Class" to race.
This is true but remember the it is also true to state "36 observations for each race, split evenly into 12 observations for each ethnicity." This is exactly referring back to "the arrows go both ways".