Solved – Specifying Mixed Effects Model Formulas

mixed modelrrepeated measuresstatsmodels

I have a dataset of experiment groups, different tests, schools, and students (note students are in the dataset multiple times due to multiple tests so they're not iid either) and want to specify a mixed effects model formula.

I would think it makes sense to have the fixed effects be the experiment group, and the random effects be tests/schools/students

Would this mixed effects formula be correct?

score ~ group + (1|test) + (1|test:school) + (1|school:student)

And how would I specify that same formula using Statsmodels? Right now I'm trying the following, but I don't totally understand how a mixed effects formula translates into statsmodels, and I think I'm missing the mark here:

vc = {'school': '0 + C(school)', 'student': '0 + C(student)' }
re_formula = '1'

model = smf.mixedlm('score ~ group', test_scores_df, re_formula = re_formula, vc_formula=vc, groups = test_scores_df['test'])

There are ~20000 total observations, with ~ 2000 unique tests, ~ 50 unique schools, and ~ 5000 unique students.

As you can see there are quite a lot of gaps in the data so some students are only taking some tests, some have data for multiple different tests, etc.

Students are nested in schools, but tests I believe are crossed with respect to both students and schools as Kerby Shedden specified below, however I believe they are partially crossed, since not all students/students receive the same tests.

So for Test A maybe students 10001, 10002, 10003, 10004 and schools 151, 152, 153 got this test A, and Test B maybe students 10002, 10003, 10004, 10005 and schools 152, 153, 154 got the test. Students are definitely nested within schools though. Would this still be specified the same as a full cross?

Best Answer

A mixed effects model with random intercepts for students nested in schools, but also with random intercepts for tests, since this is a crossed factor, should be appropriate here. I don't know statsmodels, but using the standard formula used with R packages such as lme4 the model would look like this:

score ~ group + (1 | test) +  (1 | school) + (1| school:student)
Related Question