The function you requested comes in the package {car}
in R.
I tried to figure it out running some regression models using the mtcars
package in R.
Evidently, I can get the VIF
both using the function and manually, when the regressor is a continuous variable:
require(car)
attach(mtcars)
fit1 <- lm(mpg ~ wt + hp + disp) # The model we want.
fit_wt <- lm(wt ~ hp + disp) # Regressing wt against other regressors.
rsq_wt <- summary(fit_wt)$r.square # Detecting the R square of the model
(v_wt <- 1/(1 - (rsq_wt))) # Actual formula for VIF
vif(fit1) # R built-in function
Now for the real question, here is what I find. Let's say that your regressor is am
, which corresponds to the categorical variable for the type of transmission of the car (automatic versus manual).
Ordinarily, you would fit a model such as:
fit2 <- lm(mpg ~ wt + disp + as.factor(am))
The problem is that if you try now to get the VIF for am
by just reshuffling the regressors you get an error message:
fit_am <- lm(as.factor(am) ~ wt + disp)
Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : - not meaningful for factors
Game over? Not quite... Look what happens if I treat am
as continuous:
> fit2 <- lm(mpg ~ wt + disp + as.factor(am))
> fit_am <- lm(am ~ wt + disp)
> rsq_am <- summary(fit_am)$r.square
> (v_am <- 1/(1 - (rsq_am)))
[1] 1.931264
> vif(fit2)
wt disp as.factor(am)
5.939675 4.752561 1.931264
We get the same value manually as with the R built-in function vif
.
First, I think it is better to use condition indexes rather than VIF to diagnose collinearity. See the work of David Belsley or even (if you want a soporific) my dissertation (that link seems to have vanished; this one should work (I hope).
However, to get to your question: It is possible to have very low correlations among all variables but perfect collinearity. If you have 11 independent variables, 10 of which are independent and the 11th is the sum of the other 10, then correlations will be about 0.1 but collinearity is perfect. So, high VIF does not imply high correlations.
It is also true that you can have pretty high correlations without it creating troublesome collinearity, but this is trickier to show. See the references.
Best Answer
No. In this particular case with two independent variables it is not possible.
$Y = \beta_1 * X_1 + \beta_2 * X_2 * \epsilon$
The VIF is calculated as a three step procedure
$X_1$ = $c_0$ + $\alpha * X_2$ + $\epsilon$
$VIF_i$ = $\frac{1}{1-R^2_{i}}$
While the correlation is computed in the following way.
$\rho_{x,y}$ = $corr(x,y)$ = $\frac{cov(x,y)}{\rho_{x}\rho{y}}$ = $\frac{E[(X-\mu_x)(Y-\mu_y)]}{\rho_x \rho_y}$
You should not worry if the correlation is between -0.5 and 0.5. Some people even say that a correlation between -0.8/-0.7 and 0.7/0.8 is no major problem.
You should see that both measures only represent a linear relationship between $X_1$ and $X_2$. So they cannot yield completely different measures.
If the correlation and the VIF are somewhat contradictory I propose the following procedures.
$Y = \beta_1 X_1 + \epsilon$
$Y = \beta_2 X_2 + \epsilon$