Solved – Identifying multicollinearity of categorical variables in a logistic regression

categorical datalogisticmulticollinearity

I am doing a logistic regression where all of my independent variables are categorical variables. Where some of the assumptions that a linear regression model makes can be waived for a logistic regression model, multicollinearity is still something that is to be tested for the sample data. How do I quantify the multicollinearity between several categorical variables? I looked into the available questions/answers here regarding this, few suggesting trying to fit an ill-fitted linear regression and look at the VIF to decide. Would it be enough? Or, is there any other specific methods for this purpose? Any suggestion is highly appreciated. Thank you in advance.

Best Answer

The VIF has been generalized to deal with logistic regression (assuming you mean a model with a binary dependent variable). In R, you can do this using the vif function in the car package.

As @RichardHardy has said, it is not a test though. At the end you will get some GVIFs and still need to make some subjective decisions. The thing to keep in mind is that if you have high VIFs, it means that your standard errors will be inflated from some of your estimates, so results that could be meaningful may not be detected as being significant. The books and writings by John Fox, who also co-wrote the car package, are a great resource for understanding multicollinearity.