Solved – Basic multilevel modeling help in r with glmer

glmmmultilevel-analysisr

I am looking at school data, and wondering if school level disproportionate discipline affects the academic outcomes of students.

The data I have are by student, with demographic information, an outcome binomial variable for whether the student met or exceeded standards in math and reading for that year, and then school level "excess risk", which is the proportion of Black students disciplined – the proportion of White students disciplined.

So far, I have the following models:

Model 1: null model:

model1 <- glmer(meetexceed ~ 1 + (1|school), family=binomial, data=data)

Edit: Since the p-value for my intercept is low (0.0033), the results indicate that the log odds of meeting or exceeding standards is different from 0. (Since my estimate is negative and a 1 represents that the student met or exceeded standards, this result means that the log odds is less than 0 – i.e. there is a less than 0.5 probability of meeting or exceeding standards across these schools.)

Model 2: random intercept, fixed predictor

model2 <- glmer(meetexceed ~ discipline + (1|school), family=binomial, data=data)

Edit: I changed the model I'm investigating. In this case, I added the "discipline" variable as a predictor. This is a binomial variable, where 1 indicates that the student has received discipline at least once during the school year. This model should tell me if discipline is a predictor of meeting/exceeding standards.

Model 3: random intercept, random slope

model3 <- glmer(meetexceed ~ discipline + (1+discipline|school), family=binomial, data=data)

So unlike the previous model, this one allows the effect of discipline on the meetexceed variable to vary across schools.

How am I doing so far?

Then to compare models, I can use the anova() function. My adviser mentioned needing to constrain the covariance between the slope and intercept in the last model in order to determine if the difference in models is due solely to the difference in slope variation. How do I add that constraint?

My idea:

model4<-glmer(meetexceed ~ discipline + (1|school) + (0 + discipline|school, family=binomial,data=data)

If I run all 4 of these models, I get the following (lengthy) output:

> summary(model1<-glmer(meetexceed ~ 1 + (1|school),
+                      family=binomial,
+                      data=U46))
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod
]
 Family: binomial  ( logit )
Formula: meetexceed ~ 1 + (1 | school)
   Data: U46

     AIC      BIC   logLik deviance df.resid 
 22733.7  22749.2 -11364.8  22729.7    17587 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.6476 -0.7727 -0.5674  0.9141  1.8802 

Random effects:
 Groups Name        Variance Std.Dev.
 school (Intercept) 0.4343   0.659   
Number of obs: 17589, groups:  school, 48

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept) -0.28438    0.09668  -2.942  0.00327 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(model2<-glmer(meetexceed ~ discipline + (1|school),
+                       family=binomial,
+                       data=U46))
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod
]
 Family: binomial  ( logit )
Formula: meetexceed ~ discipline + (1 | school)
   Data: U46

     AIC      BIC   logLik deviance df.resid 
 22238.9  22262.2 -11116.4  22232.9    17586 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.6563 -0.7890 -0.4842  0.8573  2.9180 

Random effects:
 Groups Name        Variance Std.Dev.
 school (Intercept) 0.4176   0.6462  
Number of obs: 17589, groups:  school, 48

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.12955    0.09512  -1.362    0.173    
discipline  -0.97643    0.04518 -21.614   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
           (Intr)
discipline -0.067
> summary(model3<-glmer(meetexceed ~ discipline 
+                       + (1 + discipline|school),
+                       family=binomial,
+                       data=U46))
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod
]
 Family: binomial  ( logit )
Formula: meetexceed ~ discipline + (1 + discipline | school)
   Data: U46

     AIC      BIC   logLik deviance df.resid 
 22216.5  22255.4 -11103.3  22206.5    17584 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.6556 -0.7891 -0.4644  0.8605  2.6541 

Random effects:
 Groups Name        Variance Std.Dev. Corr 
 school (Intercept) 0.4231   0.6505        
        discipline  0.1208   0.3475   -0.17
Number of obs: 17589, groups:  school, 48

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.13793    0.09574  -1.441     0.15    
discipline  -0.87210    0.07646 -11.406   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
           (Intr)
discipline -0.156
> summary(model4<-glmer(meetexceed ~ discipline  +
+                         (1|school) + 
+                         (0+discipline|school), 
+                          family=binomial,data=U46))
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod
]
 Family: binomial  ( logit )
Formula: meetexceed ~ discipline + (1 | school) + (0 + discipline | school)
   Data: U46

     AIC      BIC   logLik deviance df.resid 
 22215.1  22246.2 -11103.5  22207.1    17585 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.6555 -0.7896 -0.4639  0.8616  2.7517 

Random effects:
 Groups   Name        Variance Std.Dev.
 school   (Intercept) 0.4189   0.6472  
 school.1 discipline  0.1187   0.3445  
Number of obs: 17589, groups:  school, 48

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.13714    0.09529  -1.439     0.15    
discipline  -0.87749    0.07612 -11.528   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
           (Intr)
discipline -0.047
> 
> anova(model1,model2,model3,model4)
Data: U46
Models:
model1: meetexceed ~ 1 + (1 | school)
model2: meetexceed ~ discipline + (1 | school)
model4: meetexceed ~ discipline + (1 | school) + (0 + discipline | school)
model3: meetexceed ~ discipline + (1 + discipline | school)
       Df   AIC   BIC logLik deviance    Chisq Chi Df Pr(>Chisq)    
model1  2 22734 22749 -11365    22730                               
model2  3 22239 22262 -11116    22233 496.7950      1  < 2.2e-16 ***
model4  4 22215 22246 -11104    22207  25.8114      1  3.765e-07 ***
model3  5 22217 22255 -11103    22207   0.5194      1     0.4711    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The conclusions that I come to are:

  1. the log odds of meeting/exceeding standards are less than 1
  2. discipline is a significant predictor of meeting/exceeding standards (in the negative direction – being disciplined decreases the log odds of meeting/exceeding standards)
  3. because model 4 is significantly different from model 2, we can say that the effect of discipline does vary across schools

Is any of this correct?

And finally, I'm not sure I understand the difference between models 3 and 4. I could use some help there.

Best Answer

Model 1

Since the p-value for my intercept is low (0.0033), the results indicate that there IS a difference between schools. Is this correct?

No, the intercept reflects the overall intercept across schools, and the variation school-to-school is represented in the level 2 variance term. A significant intercept tells you that, overall, the log odds of students meeting or exceeding standards in math and reading is different from zero (you don't mention whether it's positive or negative, so I can't comment on the direction of the effect). To answer the question of whether there is significant variation in the baseline probability of students meeting or exceeding standards school to school, you need to check the error variance terms.

Model 2

race is a predictor for the meetexceed variable ... but assuming that the race variable effect is the same for all schools (hence the 1|school). Is this correct?

Yes, by specifying (1|school), you're not allowing the model to estimate the race effects separately for each school. You'll only get one estimate for each race dummy variable, not separate ones for each school. NOTE: See my question below about how you're handling these dummy variables.

Next steps

what I want to do next is see if the effect of race (which are all statistically significant) is the same for all schools (is this "random slope?")

Run a second model which is the same as the one you specified here (although I recommend you re-do your dummy codes, as I mention below), except also allowing random effects for race: (1 + dummyAsian + dummyBlack + dummyHisporLat + dummyTwoorMore|school). Then test this new model against the one with no random effect of race. You can use anova for this: anova(model2, model3)

then also somehow incorporate the "excess risk" variable, which is the same for all students in a particular school, but which varies by school.

You can just add it to the model, like any other predictor.

A side note

coding as dummy variables, centered based on overall district proportions

You centered your dummy variables? That's very unusual. Were you trying to enter race as a school-level predictor or something instead? If not, you should just use dummy variables to indicate the students' race, with a 1 for the dummy var that corresponds to their race and 0's for all other dummy vars for that student.