Solved – VIF (Variance Inflation Factor) and correlation in linear regression

correlationmulticollinearityregressionself-studyvariance-inflation-factor

Linear regression: $Y = X_1 + X_2$

Is that possible that $X_1$ could have a low $VIF (1.25)$ and the same time, have a $0.35$ correlation with $X_2$? If $X_1$ has almost 1 of correlation with $X_2$, implies that VIF will be higher for $X_1$?

Best Answer

No. In this particular case with two independent variables it is not possible.

$Y = \beta_1 * X_1 + \beta_2 * X_2 * \epsilon$


The VIF is calculated as a three step procedure

  1. Running an OLS from $X_2$ on $X_1$

$X_1$ = $c_0$ + $\alpha * X_2$ + $\epsilon$

  1. Calculate the VIF

$VIF_i$ = $\frac{1}{1-R^2_{i}}$

  1. Analyze the VIF. What is a large VIF. Some people say >4, some >10, some >15.

While the correlation is computed in the following way.

$\rho_{x,y}$ = $corr(x,y)$ = $\frac{cov(x,y)}{\rho_{x}\rho{y}}$ = $\frac{E[(X-\mu_x)(Y-\mu_y)]}{\rho_x \rho_y}$

You should not worry if the correlation is between -0.5 and 0.5. Some people even say that a correlation between -0.8/-0.7 and 0.7/0.8 is no major problem.


You should see that both measures only represent a linear relationship between $X_1$ and $X_2$. So they cannot yield completely different measures.


If the correlation and the VIF are somewhat contradictory I propose the following procedures.

  1. What if you eliminate a variable? Do these regression yield to different results? If yes, there might be correlation.

$Y = \beta_1 X_1 + \epsilon$

$Y = \beta_2 X_2 + \epsilon$

  1. Apply a ridge regression which is more robust to multicollinearity than an OLS regression. IF results differ there might be multicollinearity.
  2. Are the variables logically related? e.g. If the two variables are weight and height of people than you already know without a regression that presumably tall people are heavier.