Solved – VIF understanding – does only >4 variables are multi-collinear and others are not

linearmulticollinearitypcaregressionvariance-inflation-factor

I am trying to understand if there will be multicollinearity between few variables or not. I took a sample data and tried to see the Variance Influencing Factor results – in general vif > 4 indicates multicollinearity.

From the below can I say that only Q6,Q5,Q7 are multicollinear and the rest are not?

Variables      VIF

1         Q1 3.294284

2         Q2 2.500329

3         Q3 3.229811

4         Q4 1.498705

5         Q5 4.833235

6         Q6 5.798955

7         Q7 4.183958

8         Q8 3.201985

9         Q9 3.159585

10        Q10 2.824077

Can I pass Q5, Q6, Q7 to PCA and take that component and raw Q1,Q2,Q3,Q4,Q8,Q9,Q10 and run my regression? Does this makes sense ?

Best Answer

The VIF for a given predictor variable tells you to what degree that variable is correlated with a linear combination of all the other predictors. This explains VIF pretty well.

So, you don't know for sure that Q5, Q6, and Q7 are the only predictors causing multicollinearity in your model, but removing the predictors with a high VIF one at a time and re-running the model can help you figure out which predictors would be most beneficial to remove.

If you have some understanding of what these variables represent that can help you decide which ones to keep in your model.

Related Question