Solved – VIF calculation in regression

logisticmulticollinearityregressionvariance-inflation-factor

I want to use VIF to check the multicollinearity between some ordinal variables and continuous variables. When I put one variable as dependent and the other as independent, the regression gives one VIF value, and when I exchange these two, then the VIF is different. And once the VIF value is higher than 3, and the other time it is lesser than 3.

Then, how I do make a decision to keep the variable or not, and which one should I keep? Ultimately, I am going to use these variables in a logistic regression. How important it is to see multicollinearity in logistic regression?

Best Answer

It is important to address multicollinearity within all the explanatory variables, as there can be linear correlation between a group of variables (three or more) but none among all their possible pairs.

The threshold for discarding explanatory variables with the Variance Inflation Factor is subjective. Here is a recommendation from The Pennsylvania State University (2014):

VIF is a measure of how much the variance of the estimated regression coefficient $b_k$ is "inflated" by the existence of correlation among the predictor variables in the model. A VIF of 1 means that there is no correlation among the $k_{th}$ predictor and the remaining predictor variables, and hence the variance of $b_k$ is not inflated at all. The general rule of thumb is that VIFs exceeding 4 warrant further investigation, while VIFs exceeding 10 are signs of serious multicollinearity requiring correction.

Remember always sticking to the hypothesis previously formulated to investigate the relationship between the variables. Keep the predictors which make more sense in explaining the response variable.

Multicollinearity in logistic regression is equally important as other types of regression. See: Logistic Regression - Multicollinearity Concerns/Pitfalls.