Solved – High correlation between two independent variables, but no multicollinearity

correlationmulticollinearityregressionstata

I have two independent variables which have a Pearson correlation coefficient of 0.98.

The two independent variables measure the same underlying construct but only at two different points in time (one is a forecast, the other the actual realization). The VIF is around 25 – however, if I replace one variable with the incremental change to the other variable, I basically get the same information from the coefficients (just a base coeff. and an incremental value), but VIF is around 1 and multicollinearity is gone? Does that automatically mean that the initial regression has no multicollinearity problem?

Best Answer

Let XF and XA be respectively the forecast of X and the actual value of X. If the forecast is good enough, the correlation between XF and XA will be high, as it is here. You use the incremental change E, which I assume is defined as E=XF-XA.

E is just the error of your forecast. It can be correlated with the actual value XA (or XF) but does not have to be so. In this case, it is not, given that the VIF is very low. So, the multicollinearity is gone but the interpretation of your coefficient changes. Before, you had both XA and XF in your regression, now you have XA and E (or XF and E, not clear from your question).

But absent more context, neither model makes much sense to me. In the first case, the multicollinearity is high and so it is not clear why you just don't keep only one of the two variables. In the second case, the multicollinearity is low but it is not clear why you would want the error of the forecast in your model, in addition to the forecast itself. So, without more information, I would suggest to use either XA or XF.

Related Question