Variance Inflation Factor – Should VIF Removal Be Done Recursively in Regression Analysis?

multicollinearityregressionvariance-inflation-factor

This is a pretty straightforward question and I guess I will get a negative score here – I was so happy improving my points in here lol -, but I couldn't find anywhere and even though I believe I know the answer, I would like to assess with more senior Data Scientists.

When dealing with Linear Regression, it's often recommended to remove features that presents multicollinearity so we could get the correct interpretability (even though it's not a bias problem). The way we do this is using Variance Inflation Factor. Should we do this recursively? I mean, should I check VIF, then remove some of then, run again, remove others and so on? Is there anything we need to check besides VIF in this process?

Best Answer

You can remove them recursively and I think it is advisable to do so. I don’t have a paper to support this idea but you can consider the following rationale. VIF check if one input is correlated with at least another output. It doesn’t tell you, however, with which other inputs is correlated. You might have a single group of correlated variables or more. Also after you remove one input you have to recompute VIF for all remaining inputs because the input you removed might be involved. The only reasonable way I found is to do multiple iterations. At each iteration compute VIF for each feature. See if there is at least one with coefficient value considered as candidate for removal. If there is none you stop. If there are more, than you chose the one with the extreme value and iterate again. For me it produced more than reasonable results.

Related Question