Question about matrix equation for coefficients in linear regression

linear algebralinear regressionnumerical methods

There is a matrix equation for solving a linear regression, $\vec{y}=X\vec{\beta}$ where $X$ is the matrix of features, $\vec{\beta}=[\beta_1,…,\beta_n]$ are the coefficients for each feature, and $\vec{y}$ is the measurements, or the true solutions you are trying to solve for.

The matrix equation I have seen to solve for $\beta$ is $$\vec{\beta}=(X^TX)^{-1}X^T\vec{y}$$

I am wondering, why isn't the solution just : $\vec{\beta}=X^{-1}\vec{y}$ ?

Is it because $X$ may be singular, and this is some kind of psuedo inverse?

Best Answer

Yes, that's exactly what you suggested.

Typically $X\in\mathbb{R}^{m\times n}$ is a tall matrix (m>=n) with $rank(X)=n$, because # of data $m$ should be at least greater than # of parameters $n$. Also, in practice most observed matrix attains its maximum rank.

In this situation, $X^\top X$ has full rank, and the pseudo-inverse $X^\dagger$ is exactly $(X^\top X)^{-1}X^\top$.

As a quick check, we have $$((X^\top X)^{-1}X^\top ) X=I$$ and $$X(X^\top X)^{-1}X^\top=\text{projection matrix to the column space of }X$$

To prove for $X\in\mathbb{R}^{m\times n}$, if $rank(X)=n$, then $X^\dagger=(X^\top X)^{-1}X^\top$, let the SVD of X be $U\Sigma V^\top$.

By definition of pseudo-inverse, $X^\dagger=V\Sigma^{-1}U^\top$, where

$$(\Sigma^{-1})_{ij}=\begin{cases} 0 & \text{ if }\Sigma_{ji}=0 \\ 1/\Sigma_{ji} & \text{ o.w. }\end{cases}$$

the (i,j) element of $\Sigma^{-1}$ is zero if the corresponding (j,i) element in $\Sigma$ is; otherwise, it is the reciprocal of it.

Also,

$$(X^\top X)^{-1}X^\top = (V \Sigma^\top \Sigma V^\top)^{-1} V\Sigma^\top U^\top= V (\Sigma^\top \Sigma)^{-1}\Sigma^\top U^\top$$

Finally, by definition of $\Sigma^{-1}$, we have $(\Sigma^\top \Sigma)^{-1}\Sigma^\top=\Sigma^{-1}$, so the proof is complete.

Related Question