I am looking at school data, and wondering if school level disproportionate discipline affects the academic outcomes of students.
The data I have are by student, with demographic information, an outcome binomial variable for whether the student met or exceeded standards in math and reading for that year, and then school level "excess risk", which is the proportion of Black students disciplined – the proportion of White students disciplined.
So far, I have the following models:
Model 1: null model:
model1 <- glmer(meetexceed ~ 1 + (1|school), family=binomial, data=data)
Edit: Since the p-value for my intercept is low (0.0033), the results indicate that the log odds of meeting or exceeding standards is different from 0. (Since my estimate is negative and a 1 represents that the student met or exceeded standards, this result means that the log odds is less than 0 – i.e. there is a less than 0.5 probability of meeting or exceeding standards across these schools.)
Model 2: random intercept, fixed predictor
model2 <- glmer(meetexceed ~ discipline + (1|school), family=binomial, data=data)
Edit: I changed the model I'm investigating. In this case, I added the "discipline" variable as a predictor. This is a binomial variable, where 1 indicates that the student has received discipline at least once during the school year. This model should tell me if discipline is a predictor of meeting/exceeding standards.
Model 3: random intercept, random slope
model3 <- glmer(meetexceed ~ discipline + (1+discipline|school), family=binomial, data=data)
So unlike the previous model, this one allows the effect of discipline on the meetexceed variable to vary across schools.
How am I doing so far?
Then to compare models, I can use the anova() function. My adviser mentioned needing to constrain the covariance between the slope and intercept in the last model in order to determine if the difference in models is due solely to the difference in slope variation. How do I add that constraint?
My idea:
model4<-glmer(meetexceed ~ discipline + (1|school) + (0 + discipline|school, family=binomial,data=data)
If I run all 4 of these models, I get the following (lengthy) output:
> summary(model1<-glmer(meetexceed ~ 1 + (1|school),
+ family=binomial,
+ data=U46))
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod
]
Family: binomial ( logit )
Formula: meetexceed ~ 1 + (1 | school)
Data: U46
AIC BIC logLik deviance df.resid
22733.7 22749.2 -11364.8 22729.7 17587
Scaled residuals:
Min 1Q Median 3Q Max
-1.6476 -0.7727 -0.5674 0.9141 1.8802
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.4343 0.659
Number of obs: 17589, groups: school, 48
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.28438 0.09668 -2.942 0.00327 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(model2<-glmer(meetexceed ~ discipline + (1|school),
+ family=binomial,
+ data=U46))
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod
]
Family: binomial ( logit )
Formula: meetexceed ~ discipline + (1 | school)
Data: U46
AIC BIC logLik deviance df.resid
22238.9 22262.2 -11116.4 22232.9 17586
Scaled residuals:
Min 1Q Median 3Q Max
-1.6563 -0.7890 -0.4842 0.8573 2.9180
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.4176 0.6462
Number of obs: 17589, groups: school, 48
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.12955 0.09512 -1.362 0.173
discipline -0.97643 0.04518 -21.614 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
discipline -0.067
> summary(model3<-glmer(meetexceed ~ discipline
+ + (1 + discipline|school),
+ family=binomial,
+ data=U46))
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod
]
Family: binomial ( logit )
Formula: meetexceed ~ discipline + (1 + discipline | school)
Data: U46
AIC BIC logLik deviance df.resid
22216.5 22255.4 -11103.3 22206.5 17584
Scaled residuals:
Min 1Q Median 3Q Max
-1.6556 -0.7891 -0.4644 0.8605 2.6541
Random effects:
Groups Name Variance Std.Dev. Corr
school (Intercept) 0.4231 0.6505
discipline 0.1208 0.3475 -0.17
Number of obs: 17589, groups: school, 48
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.13793 0.09574 -1.441 0.15
discipline -0.87210 0.07646 -11.406 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
discipline -0.156
> summary(model4<-glmer(meetexceed ~ discipline +
+ (1|school) +
+ (0+discipline|school),
+ family=binomial,data=U46))
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod
]
Family: binomial ( logit )
Formula: meetexceed ~ discipline + (1 | school) + (0 + discipline | school)
Data: U46
AIC BIC logLik deviance df.resid
22215.1 22246.2 -11103.5 22207.1 17585
Scaled residuals:
Min 1Q Median 3Q Max
-1.6555 -0.7896 -0.4639 0.8616 2.7517
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.4189 0.6472
school.1 discipline 0.1187 0.3445
Number of obs: 17589, groups: school, 48
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.13714 0.09529 -1.439 0.15
discipline -0.87749 0.07612 -11.528 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
discipline -0.047
>
> anova(model1,model2,model3,model4)
Data: U46
Models:
model1: meetexceed ~ 1 + (1 | school)
model2: meetexceed ~ discipline + (1 | school)
model4: meetexceed ~ discipline + (1 | school) + (0 + discipline | school)
model3: meetexceed ~ discipline + (1 + discipline | school)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
model1 2 22734 22749 -11365 22730
model2 3 22239 22262 -11116 22233 496.7950 1 < 2.2e-16 ***
model4 4 22215 22246 -11104 22207 25.8114 1 3.765e-07 ***
model3 5 22217 22255 -11103 22207 0.5194 1 0.4711
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The conclusions that I come to are:
- the log odds of meeting/exceeding standards are less than 1
- discipline is a significant predictor of meeting/exceeding standards (in the negative direction – being disciplined decreases the log odds of meeting/exceeding standards)
- because model 4 is significantly different from model 2, we can say that the effect of discipline does vary across schools
Is any of this correct?
And finally, I'm not sure I understand the difference between models 3 and 4. I could use some help there.
Best Answer
Model 1
No, the intercept reflects the overall intercept across schools, and the variation school-to-school is represented in the level 2 variance term. A significant intercept tells you that, overall, the log odds of students meeting or exceeding standards in math and reading is different from zero (you don't mention whether it's positive or negative, so I can't comment on the direction of the effect). To answer the question of whether there is significant variation in the baseline probability of students meeting or exceeding standards school to school, you need to check the error variance terms.
Model 2
Yes, by specifying
(1|school)
, you're not allowing the model to estimate the race effects separately for each school. You'll only get one estimate for each race dummy variable, not separate ones for each school. NOTE: See my question below about how you're handling these dummy variables.Next steps
Run a second model which is the same as the one you specified here (although I recommend you re-do your dummy codes, as I mention below), except also allowing random effects for race:
(1 + dummyAsian + dummyBlack + dummyHisporLat + dummyTwoorMore|school)
. Then test this new model against the one with no random effect of race. You can useanova
for this:anova(model2, model3)
You can just add it to the model, like any other predictor.
A side note
You centered your dummy variables? That's very unusual. Were you trying to enter race as a school-level predictor or something instead? If not, you should just use dummy variables to indicate the students' race, with a 1 for the dummy var that corresponds to their race and 0's for all other dummy vars for that student.