Logistic Regression – Multicollinearity Concerns and Pitfalls

logisticmulticollinearityregression

In Logistic Regression, is there a need to be as concerned about multicollinearity as you would be in straight up OLS regression?

For example, with a logistic regression, where multicollinearity exists, would you need to be cautious (as you would in OLS regression) with taking inference from the Beta coefficients?

For OLS regression one "fix" to high multicollinearity is ridge regression, is there something like that for logistic regression? Also, dropping variables, or combining variables.

What approaches are reasonable for reducing the effects of multicollinearity in a logistic regression? Are they essentially the same as OLS?

(Note: this is not for the purpose of a designed experiment)

Best Answer

All of the same principles concerning multicollinearity apply to logistic regression as they do to OLS. The same diagnostics assessing multicollinearity can be used (e.g. VIF, condition number, auxiliary regressions.), and the same dimension reduction techniques can be used (such as combining variables via principal components analysis).

This answer by chl will lead you to some resources and R packages for fitting penalized logistic models (as well as a good discussion on these types of penalized regression procedures). But some of your comments about "solutions" to multicollinearity are a bit disconcerting to me. If you only care about estimating relationships for variables that are not collinear these "solutions" may be fine, but if your interested in estimating coefficients of variables that are collinear these techniques do not solve your problem. Although the problem of multicollinearity is technical in that your matrix of predictor variables can not be inverted, it has a logical analog in that your predictors are not independent, and their effects cannot be uniquely identified.

Related Question