Prove that $ \Phi(\Phi^T\Phi)^{-1}\Phi^T=I $ if $\Phi$ has more columns than rows

linear regressionmatricesmatrix equations

In linear regression, for a data set $\bar t$, the least-squares solution of the equation $\bar t = \Phi\bar w$ is $$\hat{\bar w} = (\Phi^T\Phi)^{-1}\Phi^T\bar t$$ where $\Phi$ is the design matrix and $\bar w$ is the weight vector.

When the number of data points is greater than the number of weights (i.e. the case when number of rows is lesser than the number of columns of $\Phi$), we can choose weights such that the data points are fitted exactly, i.e.
$$\begin{align*}\bar t &= \Phi \hat{\bar w}\\
&= \Phi(\Phi^T\Phi)^{-1}\Phi^T\bar t
\end{align*}$$


Clearly, $$\Phi(\Phi^T\Phi)^{-1}\Phi^T = I$$ when number of rows of $\Phi$ is lesser than the number of columns of $\Phi$. But how can this be proved? I tried writing $I = (\Phi^T\Phi)^{-1}(\Phi^T\Phi)$ but that took me nowhere.

Edit: Let us assume that $\Phi$ has full rank (all rows linearly independent).

Best Answer

Proceeding as in your comment, write $\Phi=U\Sigma V^T$ where $U$ and $V$ are orthonormal, and $\Sigma$ is rectangular-diagonal with non-negative entries along the diagonal. Plugging into $\Phi(\Phi^T\Phi)^{-1}\Phi^T$ and simplifying $U\Sigma (\Sigma^T\Sigma)^{-1}\Sigma^TU^T$. Note that $(\Sigma^T\Sigma)^{-1}$ is diagonal with diagonal with entries $1/\sigma_i^2$, with $\sigma_i$ being the diagonal elements of $\Sigma$. Thus an easy computation shows that $\Sigma (\Sigma^T\Sigma)^{-1}\Sigma^T=I$ and the desired result follows.

Related Question