Solved – Dealing with multicollinearity

multicollinearity

I have learnt that using vif() method of car package, we can compute the degree of multicollinearity of inputs in a model. From wikipedia, if the vif value is greater than 5 then we can consider that the input is suffering from multicollinearity problem. For example, I have developed a linear regression model using lm() method and vif() gives as follows. As we can see, the inputs ub, lb, and tb are suffering from multicollinearity.

 vif(lrmodel)
     tb        ub        lb          ma     ua        mb         sa     sb 
 7.929757 50.406318 30.826721  1.178124  1.891218  1.364020  2.113797  2.357946

In order to avoid the multicollinearity problem and thus to make my model more robust, I have taken interaction between ub and lb, and now vif table of new model is as follows:

   tb     ub:lb      ma       mb      sa        sb     ua
1.763331 1.407963 1.178124 1.327287 2.113797 1.860894 1.891218

There is no much difference in R^2 values and as well as there is no much difference in the errors from one-leave-out CV tests in both the above two cases.

My questions are:

  1. Is it fine to avoid the multicollinearity problem by taking the interaction as shown above?

  2. Is there any nicer way to present multicollinearity problem compared with the above vif method results.

Please provide me your suggestions.

Thanks.

Best Answer

You seem to include the interaction term ub:lb, but not ub and lb themselves as separate predictors. This would violate the so-called "principle of marginality" which states that higher-order terms should only include variables present in lower-order terms (Wikipedia for a start). Effectively, you are now including a predictor that is just the element-wise product of ub and lb.

$VIF_{j}$ is just $\frac{1}{1-R_{j}^{2}}$ where $R_{j}^{2}$ is the $R^{2}$ value when you run a regression with your original predictor variable $j$ as criterion predicted by all remaining predictors (it is also the $j$-th diagonal element of $R_{x}^{-1}$, the inverse of the correlation matrix of the predictors). A VIF-value of 50 thus indicates that you get an $R^{2}$ of .98 when predicting ub with the other predictors, indicating that ub is almost completely redundant (same for lb, $R^{2}$ of .97).

I would start doing all pairwise correlations between predictors, and run the aforementioned regressions to see which variables predict ub and lb to see if the redundancy is easily explained. If so, you can remove the redundant predictors. You can also look into ridge regression (lm.ridge() from package MASSin R).

More advanced multicollinearity diagnostics use the eigenvalue-structure of $X^{t}X$ where $X$ is the design matrix of the regression (i.e., all predictors as column-vectors). The condition $\kappa$ is $\frac{\sqrt{\lambda_{max}}}{ \sqrt{ \lambda_{min}}}$ where $\lambda_{max}$ and $\lambda_{min}$ are the largest and smallest ($\neq 0$) eigenvalues of $X^{t}X$. In R, you can use kappa(lm(<formula>)), where the lm() model typically uses the standardized variables.

Geometrically, $\kappa$ gives you an idea about the shape of the data cloud formed by the predictors. With 2 predictors, the scatterplot might look like an ellipse with 2 main axes. $\kappa$ then tells you how "flat" that ellipse is, i.e., is a measure for the ratio of the length of largest axis to the length of the smallest main axis. With 3 predictors, you might have a cigar-shape, and 3 main axes. The "flatter" your data cloud is in some direction, the more redundant the variables are when taken together.

There are some rules of thumb for uncritical values of $\kappa$ (I heard less than 20). But be advised that $\kappa$ is not invariant under data transformations that just change the unit of the variables - like standardizing. This is unlike VIF: vif(lm(y ~ x1 + x2)) will give you the same result as vif(lm(scale(y) ~ scale(x1) + scale(x2))) (as long as there are not multiplicative terms in the model), but kappa(lm(y ~ x1 + x2)) and kappa(lm(scale(y) ~ scale(x1) + scale(x2))) will almost surely differ.