Solved – Mixed model with partial nesting/partial repeating

mixed modelnested datarandom-effects-modelrepeated measuressas

Imagine this structure of data:

Each row contains a value for:
student_ID, class_ID, teacher_ID, student_Gender, teacher_Gender, submitted_Evalution

The DV of interest is submitted_Evaluation, a binary measure of whether a student submitted an evaluation for a particular class.

Let’s say I am interested in the main effects of student_Gender, teach_Gender, as well as their interaction on the likelihood that a student will submit an evaluation for a class.

Let’s say that each class is only taught by one teacher, but a teacher can teach many classes. So, class_ID is nested in teacher_ID. To an extent, students are nested in classes, but there are some students who take multiple classes (which may be with the same of different teachers). The data are not balanced – i.e., there are far more male students than female students, and the mix likely varies by class and teacher. Most students take only one class, but some take many. In this sense, there are repeated measures per student, but the repetition isn't structured (e.g., by time).

I don’t substantively care about the specific estimates the effects of classes or teachers – just the general effects of gender after controlling for these factors.

I have been trying to formulate a mixed model with random intercepts, but am having trouble specifying the model as students are partially but not fully nested in classes. Specifically, I’m using SAS, and trying models using proc glimmix and proc mixed. Efficiency is key as well – the data set isn’t huge (a couple thousand observations) but many of the specifications I try still cause SAS to hang.

For example, SAS returns an error that the model is too large to fit for the following code:

proc glimmix data=one;
   class student_Id class_ID teacher_ID student_Gender teacher_Gender ;
   model Submits_Evaluation(event="1") =  student_Gender teacher_Gender student_Gender*teacherGender /dist=binary link=logit solution; 
   random int/ sub=student_ID;
   random class_ID(teacher_ID)/subject=student_ID;
run;

But really, I'm just not sure what types of random or repeated effects I should be specifying — I am new to mixed models and have been "trying out" a bunch, but can't seem to figure out what really makes sense.

If anyone has any ideas, I would appreciate your input!

Best Answer

You might find it useful to think about random effects factors that are nested (which is more common) and random effects factors that are crossed (which is less common). If teacher_id, student_id, and class_id are random effects factors, then I (like you) would think of class_id being nested within teacher_id, and student_id being crossed with class_id(teacher_id). Hence I would consider

    random teacher_id
        class_id(teacher_id)
        student_id
        student_id*class_id(teacher_id);

If the model uses a normal distribution, then student_id*class_id(teacher_id) would be omitted from the random statement because it would be the residual variance. Your model uses a binary distribution and so has no residual variance; ideally the random statement would include student_id*class_id(teacher_id).

Adding two more random terms to your model certainly is not going to help with the problem with model size. You might find success with the HPMIXED procedure: citing the documentation, "The HPMIXED procedure is specifically designed to cope with estimation problems involving a large number of fixed effects, a large number of random effects, or a large number of observations." Unfortunately, HPMIXED assumes normal distributions. But to the rescue come Xie and Madden (2014) who have written a macro %HPGLIMMIX for generalized linear mixed models:

http://www.jstatsoft.org/article/view/v058i08/v58i08.pdf

I have never used this macro, and I have no idea whether the extensive unbalance and incompleteness of this design structure will be problematic. So I wish you good luck!

Related Solutions

Mixed Effects Model – How to Handle Nested Data with Mixed Effects Model in R

I think this is correct.

(1|Tree/Organ/Sample) expands to/is equivalent to (1|Tree)+(1|Tree:Organ)+(1|Tree:Organ:Sample) (where : denotes an interaction).
The fixed factors Treatment, Organ and Tissue automatically get handled at the correct level.
You should probably include Site as a fixed effect (conceptually it's a random effect, but it's not practical to try to estimate among-site variance with only two sites); this will reduce the among-tree variance slightly.
You should probably include all the data within a data frame, and pass this explicitly to lmer via a data=my.data.frame argument.

You may find the glmm FAQ helpful (it's focused on GLMMs but does have stuff relevant to linear mixed models as well).

Solved – Mixed Effects Model: How to Specify a Random Effect Nested Within 2 Factors in R

Let's take a step back here. If a factor is a fixed effect, it does not make sense to treat it as random, especially if there are very few observed levels of the factor and the software you are using to fit the model makes a distributional assumption about the random effects (eg. in lme4 random effects are modelled as multivariate normal).

Of course there is always the question of whether a variable is a fixed effect or not. The distinction is not always clear, and context is always important. In this case, it appears that there are only 2 observed levels of School_Type - public and private and these constitute all possible school types. If so, then this will fail most reasonable tests for when to model a factor as random. It is a fixed effect. The same cannot be said so clearly for Region, but based on the OPs simulated dataset, it appears that there are only 2 observed levels of this also, so it is not unreasonable to treat this as fixed too.

This brings us to the crux of the question - nesting. I don't believe it make sense to think of a random factor as being "nested" within a fixed effect, at least not in terms of nesting as it is usually used in a mixed modelling approach in observational studies. What does it mean, statistically, for a random effect to be nested within a fixed effect, other than meaning that each level of a random factor is associated with ("belongs to") a particular level of a fixed factor ? It does not mean that we should model the fixed factor as random. The issue of non-independence is handled by including the factor as a fixed effect. If a particular teacher "belongs" to a particular school type and the School_type is a fixed effect, then it should be treated like any other fixed effect. For example, ethnicity is often included as a covariate in observational studies, so we can say that a person "belongs" to a particular ethnicity category - but we would not think about modelling ethnicity as a random effect simply because of this "nesting".

Presumably, there are multiple measurements for each Teacher. So observations are clustered within teachers, and with a mixed model approach, we should, at a bare minimum, specify random intercepts for Teacher. Thus one possible model is:

SR_Score ~ Region * School_Type + (1|Teacher)  (1)

This will provide estimates for the association of Region, School_Type and the Region:School_Type interaction with SR_Score, while accounting for the non-independence of observations within each Teacher.

The OP mentioned interest in the Teacher:School_Type interaction. I am not sure that I understand this. Since School_Type does not (presumably) change, and teachers are presumably not nested within multiple schools of different types, it makes no sense to include this interaction as a fixed effect - and if there are many teachers this would result in many estimates (one for each teacher-school type combination). One approach would be to allow the intercepts to vary across the Teacher:School_Type combinations like so:

SR_Score ~ Region * School_Type + (1|School_Type:Teacher)    (2)

In this case it is arguable whether School_Type should be retained as a fixed effect - I would normally argue that it shouldn't, so another model is:

SR_Score ~ Region + (1|School_Type:Teacher)    (3)

however, the OP suggests that the Region:School_Type interaction is part of the research question. If so, then I would be inclined to run models (1) and (2) and compare them, first using common sense, and then with, for example, a likelihood ratio test

Finally, if it turns out that there are many more than 2 observed levels of Region, and especially if these are a subset of all regions, then it may make sense to model Region as random, in which case a random effects structure such as (1 | Region / Teacher) may make sense.

Best Answer

Related Solutions

Mixed Effects Model – How to Handle Nested Data with Mixed Effects Model in R

Solved – Mixed Effects Model: How to Specify a Random Effect Nested Within 2 Factors in R

Related Question