Regression – How to Interpret Logistic Regression Coefficient for More Than 2 Dummy Variables

categorical datalogisticodds-ratioregression

I'm not too sure how to interpret the coefficients of a variable that has more than 2 levels. Please note that my model contains explanatory variables that are numeric, binary, and with multiple category

Given that my response variable $$0 = \text{no late debt payment} , 1 = \text{has late debt payment}$$ and one of my x variables in the model is education level given by:
$$
1 = \text{no high school diploma/GED} \\
2 = \text{has high school diploma/GED}\\
3 = \text{some college education}\\
4 = \text{College education.}
$$

So, in the R glm output (family = "binomial), the coefficients for the dummy variables are:
$$
\text{EDCL2}= 0.48430 \\
\text{EDCL3}= 0.89571 \\
\text{EDCL4}= 0.45851 \\

$$

After exponentiating them, they are :
$$
\text{EDCL2}= 1.56 \\
\text{EDCL3}= 2.36 \\
\text{EDCL4}= 1.38 \\

$$

So my interpretation is as follows:

EDCL2: Implies that a respondent that has completed high school education is about 1.56 times as likely to have a late debt payment as a respondent that has NOT completed high school.

EDCL3: Implies that a respondent that has some college education is about 2.69 times as likely to have a late debt payment as a respondent that has NOT completed high school.

EDCL3: Implies that a respondent that has some college education is about 1.38 times as likely to have a late debt payment as a respondent that has NOT completed high school.

Is this interpretation correct? I know that it may be more complex than that and what would be the right way to interpret this data? Any help is appreciated. THANK YOU!

Best Answer

The original coefficients are additive on the log-odds scale, so the exponentiated coefficients ARE multiplicative, but on the odds scale. "... (T)imes as likely" is not accurate.

For example, the odds that a six-sided die will come up "1" on the next roll is 1:5 or 0.2, whereas the odds that it will come up either "1" or "2" is 2:4 or 0.5 -- more than doubled.

For another example, lets say the prevalence of a rare disease is 1 in 1 million people. Then the odds a person has the disease is 1:999,999. If someone's odds were increased by a factor of 10, due to some condition, then their odds would be 10:999,999 and their chances would be 10 in 1,000,009, which is nearly ten times, but not quite.

These two examples show that the intercept makes a difference here; these coefficients alone don't allow us to say how much more likely late debt payment is in groups 2 through 4 than group 1. It would be valid to say that the estimated log-odds are 0.48 larger in group 2 than group 1 and (equivalently) that the estimated odds are 1.56 times larger in group 2 than group 1.

Related Question