[Math] Pseudo Inverse Solution for Linear Equation System Using the SVD

linear algebra

I read about the SVD theory and its usage for solving Linear Equation System.
I saw many papers mentioning property of the solution yet no proof of it.

The Property:
The solution given the Pseudo Inverse ( $ V {\Sigma}^{-1} {U}^{H} $ ) minimizes both the error norm and the solution norm.

The minimization of the error norm is easily proved using the Normal Equations ( $ \hat{x} $ is the Least Squares solution iff $ {A}^{H} A \hat{x} = {A}^{H} b $ ).

Yet beyond the intuition of $ \hat{x} $ must lie in the Row Space of A hence its norm is minimized I couldn't find a formal proof for that.

Moreover, Let's define $ A = U \Sigma {V}^{H} $ then when we calculate its pseudo inverse we we handle $ \Sigma $ with extra care, only reversing its non zero entries. What the formal reasoning for that?

Thanks!

Best Answer

As Jack pointed out in a comment, the question is somewhat unclear. I interpret it as follows:

You have a system $Ax=b$ of linear equations. You understand why $\hat x=A^+b$, where $A^+$ is the (Moore–Penrose) pseudo-inverse of $A$, minimizes the error norm $|Ax-b|$, but you want to know why, if $Ax=b$ has solutions, $\hat x$ is the unique solution that minimizes the solution norm $|x|$.

You state that $\hat x$ must lie in the row space of $A$. That's true in the real case; more generally, in the complex case, $\hat x$ must lie in the column space of $A^*$, or equivalently in the orthogonal complement of the null space of $A$. This can be seen in two different ways:

Any solution $x$ of $Ax=b$ can be written as $x=u+v$, where $u$ is in the null space of $A$ and $v$ in its orthogonal complement. Then $|x|^2=|u|^2+|v|^2$. Since $u$ is in the null space of $A$, $v$ also solves $Ax=b$. Thus, the solution with minimal norm must have $u=0$, and must therefore lie in the orthogonal complement of the null space.

Alternatively, this can be derived using constrained minimization. To minimize $|x|^2$ under the system of constraints $Ax=b$, we introduce a vector $\lambda$ of Lagrange multipliers and consider the function $f(x)=x^*x-(Ax-b)^*\lambda$. Its gradient is $2x-A^*\lambda$, and setting this to zero yields $\hat x=A^*\lambda/2$. Thus $\hat x$ is in the column space of $A^*$.

Now there can be only one solution of $Ax-b$ in the orthogonal complement of the null space of $A$. If $x_1$ and $x_2$ are two solutions, then $A(x_1-x_2)=Ax_1-Ax_2=(Ax_1-b)-(Ax_2-b)=0$, so their difference is in the null space of $A$. If the solutions are both in the orthogonal complement of the null space, then so is their difference; since the difference is both in the null space and in its orthogonal complement, it must be zero.

Thus it suffices to show that $A^+b$ solves $Ax-b$ and lies in the column space of $A^*$. We know that if $Ax-b$ has solutions then $A^+b$ is one of them, because $A^+b$ minimizes the error norm. We also know that $A^+b$ lies in the column space of $A^*$, since $A^+b=(A^*A^{+*}A^+)b=A^*(A^{+*}A^+b)$ .