Why $x = \mathbf A^{\dagger}b$ is the one that minimizes $|x|$ among all mimizers of $|\mathbf Ax – b|$

least squareslinear algebrapseudoinversesvd

for arbitrary matrix $\mathbf A\in \mathbb R^{m \times n}$ and $rank(\mathbf A) = r$, solve the least squares:
$$\min \|\mathbf Ax – b\|_2. $$
According to SVD, pseudo inverse of $\mathbf A$ is
$$\mathbf A^{\dagger}=\mathbf{V} \mathbf \Sigma^{\dagger} {\mathbf U^T}$$
where $$\mathbf \Sigma^{\dagger}=
\begin{bmatrix}
\mathbf \Sigma_{r \times r}^{-1} & \mathbf 0 \\
\mathbf 0 & \mathbf 0
\end{bmatrix}_{n \times m}.$$

I get $x = \mathbf A^{\dagger}b$ is the canonical solution $(A^TA)^{-1}A^Tb$ for linear least squares when $m=n=r$. But don't know why it is the solution that minimizes $|x|$.

Best Answer

The column space of $A$ is the same as the span of the first $r$ columns of $U$; let $U_r$ be this $m \times r$ matrix. So the projection of $b$ onto the column space of $A$ is $\hat{b} := U_r (U_r^\top U_r)^{-1} U_r^\top b = U_r U_r^\top b$.

If $x$ is a solution to the optimization problem, then $Ax = \hat{b}$.

Thus, we can consider a second optimization problem: minimize $\|x\|_2$ subject to $Ax = \hat{b}$.

First, we check that $x := A^\dagger b$ is feasible, i.e. $A A^\dagger b = \hat{b}$. $$AA^\dagger b = U \Sigma V^\top V \Sigma^\dagger U^\top = U \begin{bmatrix} I_{r \times r} \\ & 0_{m-r \times m-r}\end{bmatrix} U^\top b = U_r U_r^\top b = \hat{b}.$$

Next we justify that it is minimum norm. Note that all other feasible $x$ can be written as $x = A^\dagger b + z$ for some $z$ in the nullspace of $A$. Note that the nullspace of $A$ is the same as the span of the last $n - r$ columns of $V$. On the other hand, $A^\dagger b = V\Sigma^\dagger U b$ lies in the span of the first $r$ columns of $V$. Thus $A^\dagger b$ and $z$ are orthogonal and we have $$\|x\|_2^2 = \|A^\dagger b\|_2^2 + \|z\|_2^2.$$ Thus choosing $z = 0$ minimizes $\|x\|_2$, so $A^\dagger b$ is the minimum norm solution.

Related Question