Prove that $ \Phi(\Phi^T\Phi)^{-1}\Phi^T=I $ if $\Phi$ has more columns than rows

linear regressionmatricesmatrix equations

In linear regression, for a data set $\bar t$, the least-squares solution of the equation $\bar t = \Phi\bar w$ is $$\hat{\bar w} = (\Phi^T\Phi)^{-1}\Phi^T\bar t$$ where $\Phi$ is the design matrix and $\bar w$ is the weight vector.

When the number of data points is greater than the number of weights (i.e. the case when number of rows is lesser than the number of columns of $\Phi$), we can choose weights such that the data points are fitted exactly, i.e.
$$\begin{align*}\bar t &= \Phi \hat{\bar w}\\
&= \Phi(\Phi^T\Phi)^{-1}\Phi^T\bar t
\end{align*}$$

Clearly, $$\Phi(\Phi^T\Phi)^{-1}\Phi^T = I$$ when number of rows of $\Phi$ is lesser than the number of columns of $\Phi$. But how can this be proved? I tried writing $I = (\Phi^T\Phi)^{-1}(\Phi^T\Phi)$ but that took me nowhere.

Edit: Let us assume that $\Phi$ has full rank (all rows linearly independent).

Best Answer

Proceeding as in your comment, write $\Phi=U\Sigma V^T$ where $U$ and $V$ are orthonormal, and $\Sigma$ is rectangular-diagonal with non-negative entries along the diagonal. Plugging into $\Phi(\Phi^T\Phi)^{-1}\Phi^T$ and simplifying $U\Sigma (\Sigma^T\Sigma)^{-1}\Sigma^TU^T$. Note that $(\Sigma^T\Sigma)^{-1}$ is diagonal with diagonal with entries $1/\sigma_i^2$, with $\sigma_i$ being the diagonal elements of $\Sigma$. Thus an easy computation shows that $\Sigma (\Sigma^T\Sigma)^{-1}\Sigma^T=I$ and the desired result follows.

Best Answer

Related Solutions

Linear dependency among columns and rows

Given a randomly generated binary matrix with fixed row and column weights, what is the probability that two columns have ones at the same row

Related Question