In multiple linear regression, the correlation matrix of predictors already clearly indicates the strength of correlation between any two predictors. Why do we use any other tools, such as VIF, to check the existence of multicollinearity?
Solved – What information VIF can provide but correlation matrix cannot in detecting multicollinearity
multicollinearitymultiple regressionregression
Related Question
- Solved – Regression model constant causes multicollinearity warning, but not in standardized model
- Regression – Predictor Flipping Sign in Regression with No Multicollinearity Explained
- Correlation vs. VIF in OLSR Model: Understanding Multicollinearity Decision Factors
- Multicollinearity – Determinant Correlation Matrix Equals Zero
Best Answer
VIF can help identify multicollinearity, i.e. the case where one variable is strongly correlated with a linear combination (weighted sum) of several variables. This cannot necessarily be detected by looking at individual correlations. As @gung's answer to this question (which asks a few too many questions at once) says:
Here's a particular example: suppose you are trying to model/predict based on a set of compositional variables, i.e. you know the strength of someone's preferences brand A or brand B or brand C or brand D ... or brand Z, which add up to 1 overall (by definition/construction), and you want to use "prefers brand *" as a set of predictors in the same model. Preferences for particular pairs of brands may be either positively or negatively correlated, but overall the set of predictors contains only 25, not 26, pieces of information. So the correlation between $(A+B+\ldots+Y)$ and $Z$ (or between any preference and the sum of all of the other preferences) is exactly -1, even though the correlations of particular brand preferences with $Z$ can be all over the place.