Solved – Is it normal for the intercept to have a 1600 Variance Inflation Factor (VIF)

feature selectionpythonregressionvariance-inflation-factor

I'm using Python's module to calculate the VIF for my variables to be used in a binary logistic regression. I'm completely following this post to do this: https://etav.github.io/python/vif_factor_python.html.

With my data, I got a VIF of 1600+ for the intercept, which looks very weird for me (I have used VIF in R before but never seen it). Is it something normal or should I do something about it? Other variables seems normal except for one that has a slightly higher VIF.

To add more context, my response variable is highly unbalanced, it's mostly (~99%) 0 and only 1% positive. I got a feeling that this might be the case since the intercept is all one's.

enter image description here

Any suggestions, helps are welcomed! Please let me know if you need more content as well.

Best Answer

Yes and no. Depends on the scale of the outcome, depends how close the grand mean is to zero (and if the other covariates are centered_.

If you're asking whether something is algorithmically wrong with that number, then the answer is likely no. It is simple computation with an exact analytic solution.

The number is closely related to the concept of removing the intercept from the model. Removing the intercept from a model makes very little sense in most cases, as evidenced by this apparently large and meaningless number. VIF of 1600 tells you how variable the residuals would be if you removed a grand mean from among the number of predictors of the outcome.

Also related When is it ok to remove the intercept in a linear regression model?