Linear Regression – Intuitive Explanation of Multicollinearity Issues

faqintuitionmulticollinearityregression

The wiki discusses the problems that arise when multicollinearity is an issue in linear regression. The basic problem is multicollinearity results in unstable parameter estimates which makes it very difficult to assess the effect of independent variables on dependent variables.

I understand the technical reasons behind the problems (may not be able to invert $X' X$, ill-conditioned $X' X$ etc) but I am searching for a more intuitive (perhaps geometric?) explanation for this issue.

Is there a geometric or perhaps some other form of easily understandable explanation as to why multicollinearity is problematic in the context of linear regression?

Best Answer

Consider the simplest case where $Y$ is regressed against $X$ and $Z$ and where $X$ and $Z$ are highly positively correlated. Then the effect of $X$ on $Y$ is hard to distinguish from the effect of $Z$ on $Y$ because any increase in $X$ tends to be associated with an increase in $Z$.

Another way to look at this is to consider the equation. If we write $Y = b_0 + b_1X + b_2Z + e$, then the coefficient $b_1$ is the increase in $Y$ for every unit increase in $X$ while holding $Z$ constant. But in practice, it is often impossible to hold $Z$ constant and the positive correlation between $X$ and $Z$ mean that a unit increase in $X$ is usually accompanied by some increase in $Z$ at the same time.

A similar but more complicated explanation holds for other forms of multicollinearity.