Solved – Mixed Effects Model: How to Specify a Random Effect Nested Within 2 Factors in R

mixed modelnested datanested-modelsrandom-effects-model

I have some example data that I need to fit using a mixed effects model. I can do this in Minitab, but struggling to specify the model correctly in R. I need 2 factors treated as fixed effects "Teacher" and "School_Type". I also need the interaction of these fixed effects Teacher*School_Type in the model. I am struggling to specify the random effect "Teacher" because it is nested within both "Region" and "School_Type". I think the random effect term specification would be (1|Teacher). To specify that Teacher is nested in Region, I expect that it would be (1|Region/Teacher), this is where my uncertainty creeps in! Further I need "Teacher" nested in both "Region" and "School_Type".

The best I could guess to specify a random effect "Teacher" nested in both "Region" and "School_Type" is (1|Region/School_Type/Teacher). I know this isn't correct. As I have this specified is it treating "Region" and "School_Type" as Random also? How should I properly specify?

Thanks in Advance!-Pat

#LOAD PACKAGE
install.packages("lme4")
library(lme4)

#THE DATA
Region = c(rep("EastUS",8),rep("WestUS",8))
School_Type = c(rep("Private",4),rep("Public",4),rep("Private",4),rep("Public",4))
Teacher =rep(c(1,1,2,2),4)
Class = rep(c(1,2),8)
SR_Score = c(81,85,88,89,68,72,78,75,88,90,91,89,94,97,93,89)
data1 = cbind.data.frame(Region,School_Type,Teacher,Class,SR_Score)

#MIXED EFFECTS MODEL NEEDS TO MODEL 4 EFFECTS
#[1] "Region" (Fixed Effect)
#[2] "School_Type" (Fixed Effect)
#[3] "Region*School_Type" (Fixed Effect, Interaction)
#[4] "Teacher" (Random Effect, Nested in "Region" and "School_Type")

#HERE IS WHAT I HAVE, BUT I KNOW THIS IS INCORRECT
summary(lmer(SR_Score~Region+School_Type+Region*School_Type+(1|Region/School_Type/Teacher),data=data1))

Best Answer

Let's take a step back here. If a factor is a fixed effect, it does not make sense to treat it as random, especially if there are very few observed levels of the factor and the software you are using to fit the model makes a distributional assumption about the random effects (eg. in lme4 random effects are modelled as multivariate normal).

Of course there is always the question of whether a variable is a fixed effect or not. The distinction is not always clear, and context is always important. In this case, it appears that there are only 2 observed levels of School_Type - public and private and these constitute all possible school types. If so, then this will fail most reasonable tests for when to model a factor as random. It is a fixed effect. The same cannot be said so clearly for Region, but based on the OPs simulated dataset, it appears that there are only 2 observed levels of this also, so it is not unreasonable to treat this as fixed too.

This brings us to the crux of the question - nesting. I don't believe it make sense to think of a random factor as being "nested" within a fixed effect, at least not in terms of nesting as it is usually used in a mixed modelling approach in observational studies. What does it mean, statistically, for a random effect to be nested within a fixed effect, other than meaning that each level of a random factor is associated with ("belongs to") a particular level of a fixed factor ? It does not mean that we should model the fixed factor as random. The issue of non-independence is handled by including the factor as a fixed effect. If a particular teacher "belongs" to a particular school type and the School_type is a fixed effect, then it should be treated like any other fixed effect. For example, ethnicity is often included as a covariate in observational studies, so we can say that a person "belongs" to a particular ethnicity category - but we would not think about modelling ethnicity as a random effect simply because of this "nesting".

Presumably, there are multiple measurements for each Teacher. So observations are clustered within teachers, and with a mixed model approach, we should, at a bare minimum, specify random intercepts for Teacher. Thus one possible model is:

SR_Score ~ Region * School_Type + (1|Teacher)  (1)

This will provide estimates for the association of Region, School_Type and the Region:School_Type interaction with SR_Score, while accounting for the non-independence of observations within each Teacher.

The OP mentioned interest in the Teacher:School_Type interaction. I am not sure that I understand this. Since School_Type does not (presumably) change, and teachers are presumably not nested within multiple schools of different types, it makes no sense to include this interaction as a fixed effect - and if there are many teachers this would result in many estimates (one for each teacher-school type combination). One approach would be to allow the intercepts to vary across the Teacher:School_Type combinations like so:

SR_Score ~ Region * School_Type + (1|School_Type:Teacher)    (2)

In this case it is arguable whether School_Type should be retained as a fixed effect - I would normally argue that it shouldn't, so another model is:

SR_Score ~ Region + (1|School_Type:Teacher)    (3)

however, the OP suggests that the Region:School_Type interaction is part of the research question. If so, then I would be inclined to run models (1) and (2) and compare them, first using common sense, and then with, for example, a likelihood ratio test

Finally, if it turns out that there are many more than 2 observed levels of Region, and especially if these are a subset of all regions, then it may make sense to model Region as random, in which case a random effects structure such as (1 | Region / Teacher) may make sense.

Related Question