Solved – Multicollinearity between two categorical variables

multicollinearityr

Is Variance inflation factor(VIF) also applicable in order to test multicollinearity in between two categorical variables? What is the use of the Spearman test? How to do this on R?

Best Answer

Generalized VIF is your friend. See example:

data1<-data.frame(
  y = rnorm(8),
  x1 = factor(LETTERS[c(1,1,1,2,2,2,3,3)]),
  x2 = factor(letters[c(1,1,2,1,2,3,2,3)]),
  x3 = factor(rep(c('one','two'),4))
)

data1

            y x1 x2  x3
1 -1.20757109  A  a one
2 -0.92517490  A  a two
3 -1.97064426  A  b one
4  0.91072507  B  a two
5  0.82909639  B  b one
6  0.04714072  B  c two
7 -1.00678648  C  b one
8 -0.08177810  C  c two

library(car)

vif(lm(y~x1+x2+x3, data=data1))

       GVIF Df GVIF^(1/(2*Df))
x1 1.800000  2        1.158292
x2 4.950000  2        1.491596
x3 3.466667  1        1.861899

And read about GVIF in ?vif:

If all terms in an unweighted linear model have 1 df, then the usual variance-inflation factors are calculated.

If any terms in an unweighted linear model have more than 1 df, then generalized variance-inflation factors (Fox and Monette, 1992) are calculated. These are interpretable as the inflation in size of the confidence ellipse or ellipsoid for the coefficients of the term in comparison with what would be obtained for orthogonal data.

The generalized vifs are invariant with respect to the coding of the terms in the model (as long as the subspace of the columns of the model matrix pertaining to each term is invariant). To adjust for the dimension of the confidence ellipsoid, the function also prints GVIF^[1/(2*df)] where df is the degrees of freedom associated with the term.