Solved – multicollinearity high R squared

multicollinearitymultiple regressionr-squaredregression

I understand that one of the ways to detect multicollinearity would be to observe low t-stats and high r squared. t-stats will will be low because the standard errors of the coefficients will be high, but why will R squared be high ? can we prove it mathematically ?

Best Answer

High is relative—and I think this may be the point of the statement.

$R^2$ is the model sum of squares divided by the total sum of squares. In other words, the variation in $Y$ that is explained by $X_1$ and $X_2$ divided by the total variation in $Y$. If the t-stats are individually low yet the $R^2$ is high, it means that the $X_1$ and $X_2$ together have high explanatory power. Contrast this with the case in which the t-stats are individually low but the $R^2$ is low, too.

In the first case, in contrast with the individual t-stats, the F-test for the null that they are jointly zero would also suggest that they are not both zero.

Although as @stans implies in his comment, strictly speaking multicollinearity is about the correlation between $X_1$ and $X_2$. So $X_1$ and $X_2$ can even have zero $R^2$ when regressing $Y$ on them, yet if they are highly correlated with each other we say there is high multicollinearity. If the correlation between them is 1, we say they are perfectly collinear. If this is the case, the OLS estimator is not defined.