Solved – How to interpret insignificant categorical variables for logistic regression

categorical datainterpretationlogisticregression

I am trying to interpret categorical variables with more than two classes. Some are significant whilst other classes are not. What can I infer from the insignificant ones? Does this mean the insignificant ones and the reference category equally influence the dependent variable?

For example:

ETHNICITY (Reference Category - Indian)
Other Asian: Sig = .273     exp(b) = 1.123 
African:     Sig = .000     exp(b) =  .148

Best Answer

In general, you do not want to interpret the p-values for the levels of a categorical variable that come with typical statistical output. By default, categorical values are represented in a model (logistic regression or otherwise) using reference level coding1. The tests in your statistical output are comparing each of the non-reference categories to the reference category. It is quite possible for none of those to be significant, but for there to be significant differences amongst the non-reference categories. To get a meaningful test of a categorical variable, you want to drop all levels of the categorical variable (i.e., the whole categorical variable) from the model and perform a nested model test2. Note that the reduced model could be a null (intercept only) model and that a nested model test for a logistic regression would be a likelihood ratio test instead of an $F$-test.

If you believe, based in part on the result of the nested model test, that the categorical variable is relevant to the model, then you can try to determine which levels differ. This is analogous to determining which groups differ following a one-way ANOVA. Bear in mind that a non-significant difference between two levels does not mean those levels are the same3 with respect to how they influence the dependent variable.

1. To better understand reference cell coding, my answer here may help: Regression based for example on days of week.
2. I explain nested model tests here: Testing for moderation with continuous vs. categorical moderators.
3. For more on that idea, it may help you to read my answer here: Why do statisticians say a non-significant result means "you can't reject the null", as opposed to accepting the null hypothesis?