Multicollinearity in OLS: Why “No Perfect Multicollinearity” Is Crucial

least squaresmulticollinearityregression

Let $Y = \beta_1 X_1 + \beta_2 X_2 +… + u$ where $u$ is the error term and $X_i's$ are the regressors.

One of the assumptions state that

(1) There is no perfect multicollinearity.

I couldn't really get what would the existence of a perfect multicollinearity do to our coefficients? I understood what perfect multicollinearity means, and all that, but what does it do our regression that somehow it is assumed at the beginning not to exist?

Also, When having perfect multicollinearity, why does dropping the intercept help us avoid it? What I mean is that when two regressors are in a linear relationship for some reason when you set the intercept of the regression $ = 0 $ for some reasons now you avoid multicollinearity?

Best Answer

I bet this was covered a million times on this board. In a nutshell: because the design matrix becomes degenerated, and there is no unique solution to the linear algebra problem of OLS. There will be infinite number of equally good solutions, and there's no way to tell which one is better.

Technical details: the design matrix is a matrix that is constructed by putting all $p$ variables in columns and all $n$ observations in rows. It is $X_{ij}$ where $i$ is rows from 1 to $n$ and $j$ is columns from 1 to $p$. It so happens that when there is a perfect collinearity then matrix $X$ can be reduced to a matrix $X'_{ik}$ where each column now represents a new set of variables $k=[1, p']$ such that $p'<p$. In other words the new design matrix $X'$ has fewer columns than the original, yet no information was lost.

In this case the usual solution $\beta=(X^TX)^{-1}X^TY$ does not exist because $(X^TX)^{-1}$ is singular. On the other hand the solution $\beta'=(X'^TX')^{-1}X'^TY$ does exist on the new set of variables. So, the only problem with perfect collinearity is that the original set of variables does not have a unique solution, but it does have solutions.

The implication is that you can pick any of the non unique solutions, and it will be as good as any other. Note, it will not be as bad as any other. So, you can use this solution to predict $Y'$. The only problem's that you'll have to step outside a typical OLS method to find the solutions, because OLS' linear algebra trick doesn't work. Things like gradient descent will work.