Solved – How to interpret GLM coefficients

generalized linear modelregression

I am reproducing the results from COMPAS analysis done by propublica and I needed some help understanding how they handled interpretation of GLM coefficients. Score_factor is a variable indicating risk of recidivism and its regressed against variables like race, gender etc. The model is given below.

Call:
glm(formula = score_factor ~ gender_factor + age_factor + race_factor + 
    priors_count + crime_factor + two_year_recid, family = "binomial", 
    data = df)

Coefficients:    
                            Estimate Std. Error z value Pr(>|z|)    
(Intercept)                 -1.52554    0.07851 -19.430  < 2e-16 ***
gender_factorFemale          0.22127    0.07951   2.783 0.005388 ** 
age_factorGreater than 45   -1.35563    0.09908 -13.682  < 2e-16 ***
age_factorLess than 25       1.30839    0.07593  17.232  < 2e-16 ***
race_factorAfrican-American  0.47721    0.06935   6.881 5.93e-12 ***
race_factorAsian            -0.25441    0.47821  -0.532 0.594717    
race_factorHispanic         -0.42839    0.12813  -3.344 0.000827 ***
race_factorNative American   1.39421    0.76612   1.820 0.068784 .  
race_factorOther            -0.82635    0.16208  -5.098 3.43e-07 ***
priors_count                 0.26895    0.01110  24.221  < 2e-16 ***
crime_factorM               -0.31124    0.06655  -4.677 2.91e-06 ***
two_year_recid               0.68586    0.06402  10.713  < 2e-16 ***

From the model above, they concluded that "Black defendants are 45% more likely than white defendants to receive a higher score correcting for the seriousness of their crime, previous arrests, and future criminal behavior." (Note race_factorWhite is part of intercept) based on following calculation:

control <- exp(-1.52554) / (1 + exp(-1.52554))
exp(0.47721) / (1 - control + (control * exp(0.47721)))
[1] 1.452841

I am aware that we can get change in odds ratio of score_factor between white and black defendants by doing exp(0.47721). But I am not sure what the calculation with control and intercept is doing. Can someone please explain?

Best Answer

The original author of the analysis was kind enough to respond and clear this up for me. The calculation converts odds ratio (which are highly unintuitive) to relative risk (which is easier to understand). More details are provided here.

The GLM coefficients only show the multiplicative change in odds ratio. so if p1 is the risk of getting a high score for black defendants and p0 is the risk of getting a high score for white defendants, then exp(0.47721) shows (p1/(1-p1))/(p0/(1-p0)). Unfortunately, this is not particularly easy to intuit. Relative risk (p1/p0) is a much simpler concept to understand and odds ratio can be converted to relative risk using the formula: Relative risk=odds ratio/(1−p0+(p0×odds ratio)). The calculation in the question accomplishes that.

Related Question