Correlation vs. VIF in OLSR Model: Understanding Multicollinearity Decision Factors

correlationleast squaresmulticollinearitymultiple regressionvariance-inflation-factor

enter image description here

Date, age, mrt and shops are all predictors in a dataset of 414 observations. Pearson's product-moment correlation shows a sizeable negative correlation between mrt and shops (-0.6 so definitely higher than the minimum benchmark of 2/sqrt(n)). Yet the VIF for both is quite low. Does this mean there is multicollinearity or not? And why is the VIF so low if Pearson's r is -0.6?

Ps: I have found a similar question here, but there Pearson's r is not negative and that might mean a difference. Any help would be much appreciated.

Best Answer

This is largely covered elsewhere, e.g., in my answer to When can we speak of collinearity. Whether Pearson's $r$ is positive or negative makes no difference.

I have never heard of your "minimum benchmark", and it doesn't make any sense to me. Consider that if you only had $4$ data, I gather your minimum benchmark would say that a pairwise correlation between variables equal to $r = 1.0$ would be fine (i.e., $2/\sqrt{4} = 1$), whereas if you had $1600$ data, any $r>.05$ would be problematic (i.e., $2/\sqrt{1600} = .05$). I may be misunderstanding it, but that's nonsensical. Consider that, unless you have perfect multicollinearity, the primary impact is a reduction of power but that can still be overcome with sufficient $N$ (cf., my answer to: What is the effect of having correlated predictors in a multiple regression model?).

By (arbitrary) rule of thumb, you have a 'problem with multicollinearity' when you have a ${\rm VIF} \ge 10$. With respect to pairwise correlations alone, that would imply $|r| \gtrapprox .95$.