[Math] Moore-Penrose Inverse as least-squares solution

least squarespseudoinversesystems of equations

I'm trying to understand a conclusion in [1], [2].

There they state a definition and a theorem:

Definition(5.2 in [1], 2.2 in [2]): For a general linear System $Ax=y$, we say that $\hat{x}$ is a least-squares solution, if $||A\hat{x}-y|| = \min_x ||Ax-y||$. We say that $\hat{x}$ is a minimum least-squares solution, if $\hat{x}$ has the smallest norm among all least-square solutions.

Theorem(5.1 in [1], 2.1 in [2]): Let there exist a matrix $G$ such that $Gy$ is a minimum least-squares solution of a linear system $Ax=y$. Then it is necessarcy and sufficent that $G = A^+$, the Moore-Penrose generalized inverse of matrix A.

Now, both papers want to find a minimum least-squares solution of the linear system $H\beta = T$, that is
$$||H\hat{\beta} – T|| = \min_\beta ||H\beta – T||, \qquad (1)$$
where $H \in \mathbb{R}^{N \times \tilde{N}}$, $\beta \in \mathbb{R}^{\tilde{N} \times m}$ and $T \in \mathbb{R}^{N \times m}$.

They argue as follows: "According to the theorem, the minimum least-squares solution of above linear system is $\hat{\beta} = H^+T$, where $H^+$ is the Moore-Penrose generalized inverse of matrix $H$.".

Let be $\hat{\beta}$ the solution found in $(1)$. To apply the theorem, there must exist a matrix $G$ so that $\hat{\beta} = GT$. But can we make sure there always exist such a matrix $G$? From my point of view, I can't see that this always holds, because of the dimensions of $\hat{\beta}$ and $T$.

Best Answer

When does the least squares solution exist?

$$\mathbf{A}x = b$$

$$\mathbf{A} \in \mathbb{C}^{m\times n}_{\rho}, \qquad b \in \mathbb{C}^{m}, \qquad x \in \mathbb{C}^{n}$$

In short, when the data vector $b$ is not in the null space: $$ b\notin\color{red}{\mathcal{N}\left( \mathbf{A}^{*} \right)} $$ The least squares solution is the projection of the data vector $b$ onto the range space $\color{blue}{\mathcal{R}\left( \mathbf{A}\right)}$. A projection will exist provided that some component of the data vector is in the range space.

Projection

This post works through a derivation of the least squares solution based upon the singular value decomposition of a nonzero matrix. How does the SVD solve the least squares problem?

Another approach is to start with the guaranteed existence of the singular value decomposition $$ \mathbf{A} = \mathbf{U} \, \Sigma \, \mathbf{V}^{*}, $$ and build the Moore-Penrose pseudoinverse $$\mathbf{A}^{+} = \mathbf{V} \Sigma^{+} \mathbf{U}^{*}.$$ The pseudoinverse is a matrix projection operator onto the range space $\color{blue}{\mathcal{R}\left( \mathbf{A}\right)}$.

Related Question