Solved – Logistic regression: if only some classes of a categorical variable appear significant

categorical datalogistic

I am performing (rare events) logistic regression analyses in R and want to test several categorical variables consisting of more than two classes. I understood that I can do this by using factor(). However, I am not sure what to do in case some, but not all categories are significant. Can I (1) just leave out the ones that are not significant, assuming their coefficients equal 0, or (2) do I have to include them all in the model, or (3) do I somehow have to recalibrate the logistic regression model with only the significant dummies?

Thanks in advance!

Best Answer

The contrasts you mention are a function of the choice of reference cell, so are arbitrary. Removing dummy variables (combining categories) will ruin type I error, confidence interval coverage, and bias estimates. There is nothing wrong with having 'insignificant' effects in a model.