Solved – Mixed model with partial nesting/partial repeating

mixed modelnested datarandom-effects-modelrepeated measuressas

Imagine this structure of data:

Each row contains a value for:
student_ID, class_ID, teacher_ID, student_Gender, teacher_Gender, submitted_Evalution

The DV of interest is submitted_Evaluation, a binary measure of whether a student submitted an evaluation for a particular class.

Let’s say I am interested in the main effects of student_Gender, teach_Gender, as well as their interaction on the likelihood that a student will submit an evaluation for a class.

Let’s say that each class is only taught by one teacher, but a teacher can teach many classes. So, class_ID is nested in teacher_ID. To an extent, students are nested in classes, but there are some students who take multiple classes (which may be with the same of different teachers). The data are not balanced – i.e., there are far more male students than female students, and the mix likely varies by class and teacher. Most students take only one class, but some take many. In this sense, there are repeated measures per student, but the repetition isn't structured (e.g., by time).

I don’t substantively care about the specific estimates the effects of classes or teachers – just the general effects of gender after controlling for these factors.

I have been trying to formulate a mixed model with random intercepts, but am having trouble specifying the model as students are partially but not fully nested in classes. Specifically, I’m using SAS, and trying models using proc glimmix and proc mixed. Efficiency is key as well – the data set isn’t huge (a couple thousand observations) but many of the specifications I try still cause SAS to hang.

For example, SAS returns an error that the model is too large to fit for the following code:

proc glimmix data=one;
   class student_Id class_ID teacher_ID student_Gender teacher_Gender ;
   model Submits_Evaluation(event="1") =  student_Gender teacher_Gender student_Gender*teacherGender /dist=binary link=logit solution; 
   random int/ sub=student_ID;
   random class_ID(teacher_ID)/subject=student_ID;
run;

But really, I'm just not sure what types of random or repeated effects I should be specifying — I am new to mixed models and have been "trying out" a bunch, but can't seem to figure out what really makes sense.

If anyone has any ideas, I would appreciate your input!

Best Answer

You might find it useful to think about random effects factors that are nested (which is more common) and random effects factors that are crossed (which is less common). If teacher_id, student_id, and class_id are random effects factors, then I (like you) would think of class_id being nested within teacher_id, and student_id being crossed with class_id(teacher_id). Hence I would consider

    random teacher_id
        class_id(teacher_id)
        student_id
        student_id*class_id(teacher_id);

If the model uses a normal distribution, then student_id*class_id(teacher_id) would be omitted from the random statement because it would be the residual variance. Your model uses a binary distribution and so has no residual variance; ideally the random statement would include student_id*class_id(teacher_id).

Adding two more random terms to your model certainly is not going to help with the problem with model size. You might find success with the HPMIXED procedure: citing the documentation, "The HPMIXED procedure is specifically designed to cope with estimation problems involving a large number of fixed effects, a large number of random effects, or a large number of observations." Unfortunately, HPMIXED assumes normal distributions. But to the rescue come Xie and Madden (2014) who have written a macro %HPGLIMMIX for generalized linear mixed models:

http://www.jstatsoft.org/article/view/v058i08/v58i08.pdf

I have never used this macro, and I have no idea whether the extensive unbalance and incompleteness of this design structure will be problematic. So I wish you good luck!

Related Question