Solved – Can VIF and backward elimination be used on a logistic regression model

analysislogisticmulticollinearityregressionvariance-inflation-factor

I'm conducting a study on mandatory reports in the healthcare sector. I've got a sample of 760 visits (690 individual patients ). I will use a binary logistic regression model to see if my independent variables will affect if a child is reported or not (yes/no). Reported will be set ass dependent variable. I've checked for multicollinearity using VIF.

I have two questions:

  1. Is it okay to use VIF when the variables are categorical (1/0)? If no, is there another way to check for multicollinearity?
  2. Can I use backward elimination to exclude variables that are not significant? Or is there a better way to select the model?

Really appreciate if someone could help me!

Best Answer

Multicolinearity (and VIF) in logistic regression is already discussed on this site, e.eg VIF calculation in regression or Binary Logistic Regression Multicollinearity Tests.

Can you use VIF with binary (0/1) variables? Why not? VIF only depends on the design matrix, and no distributional assumptions are needed!

Then the last question about variable selection. This is much discussed on this site, so search. The short answer is NO, the best approach is to select your variables before looking at the data. If that for some reason is impossible (to many variables, ...), then think about regularization. Otherwise, look through this list and especially Variable selection for predictive modeling really needed in 2016?