I ran a linear regression of acceptance into college against SAT scores and family / ethnic background. The data are fictional. This is a follow-up on a prior question, already answered. The question focuses in the gathering and interpretation of odds ratios when leaving the SAT scores aside for simplicity.
The variables are Accepted
(0 or 1) and Background
("red" or "blue"). I set up the data so that people of "red" background were more likely to get in:
fit <- glm(Accepted~Background, data=dat, family="binomial")
exp(cbind(Odds_Ratio_RedvBlue=coef(fit), confint(fit)))
Odds_Ratio_RedvBlue 2.5 % 97.5 %
(Intercept) 0.7088608 0.5553459 0.9017961
Backgroundred 2.4480042 1.7397640 3.4595454
Questions:
-
Is 0.7 the odd ratio of a person of "blue" background being accepted? I'm asking this because I also get 0.7 for "
Backgroundblue
" if instead I run the following code:fit <- glm(Accepted~Background-1, data=dat, family="binomial") exp(cbind(OR=coef(fit), confint(fit)))
-
Shouldn't the odds ratio of "red" being accepted ($\rm Accepted/Red:Accepted/Blue$) just the reciprocal: ($\rm OddsBlue = 1 / OddsRed$)?
Best Answer
I've been working on answering my question by calculating manually the odds and odds ratios:
So the Odds Ratio of getting into the school of Red over Blue is:
$$ \frac{\rm Odds\ Accept\ If\ Red}{\rm Odds\ Acccept\ If\ Blue} = \frac{^{177}/_{102}}{^{112}/_{158}} = \frac {1.7353}{0.7089} = 2.448 $$
And this is the
Backgroundred
return of:At the same time, the
(Intercept)
corresponds to the numerator of the odds ratio, which is exactly the odds of getting in being of 'blue' family background: $112/158 = 0.7089$.If instead, I run:
The returns are precisely the odds of getting in being 'blue':
Backgroundblue
(0.7089) and the odds of being accepted being 'red':Backgroundred
(1.7353). No Odds Ratio there. Therefore the two return values are not expected to be reciprocal.Finally, How to read the results if there are 3 factors in the categorical regressor?
Same manual versus [R] calculation:
I created a different fictitious data set with the same premise, but this time there were three ethnic backgrounds: "red", "blue" and "orange", and ran the same sequence:
First, the contingency table:
And calculated the Odds of getting in for each ethnic group:
As well as the different Odds Ratios:
And proceeded with the now routine logistic regression followed by exponentiation of coefficients:
Yielding the odds of getting in for "blues" as the
(Intercept)
, and the Odds Ratios of Orange versus Blue inBackgroundorange
, and the OR of Red v Blue inBackgroundred
.On the other hand, the regression without intercept predictably returned just the three independent odds: