Solved – categorical independent variable with three levels and binary logistic regression

categorical-encodingcontrastshypothesis testinglogisticregression

I want to learn which level of a categorical independent variable should I look to interpret the odd ratios in binary logistic regression. For example, I have one independent categorical variable (education ) with three levels. The results of binary logistic regression show that "edu" (reference level) is not significant but one of the levels is significant "edu(2)".

Is it possible to ignore the overall significance of "edu" and interpret the "edu(2)"?

Best Answer

You should be very careful with that! We need to know what the three levels of edu signifies, and how that categorical variable is encoded. The most used encoding is dummy variables (also known as "one-hot"). In that case, the two estimated parameters that you see, represents difference to the reference level. Do you really want to test the hypothesis that that difference is zero?

Depending on how you encode a categorical variable, the estimated parameters represent different hypothesis! This is discussed in more detail in some other posts: Categorical variable coding to compare all levels to all levels, Regression with categorical predictors - use only some dummy variables, Can I ignore coefficients for non-significant levels of factors in a linear model?

What you should do (ideally, this should be part of a written analysis plan, written before collecting data/starting data analysis): Decide which contrasts you want to test, and then test only those contrast (a contrast is a difference between betas, or more generally, a linear combination of betas with coefficients summing to zero). If you decide on contrasts after starting analysis, you are into territory of post-hoc testing.