Moore-Penrose inverse of a matrix that is both row- and column-deficient

linear algebrapseudoinverse

As discussed in detail in What forms does the Moore-Penrose inverse take under systems with full rank, full column rank, and full row rank?, for a system of equations

$$Ax=b,$$

if $A$ is full-rank in rows but rank-deficient in columns (the system is under constrained), the Moore-Penrose inverse of $A$ finds the minimum-norm solution for system of equations, i.e.

$$x=A^{+}b$$

is the solution to the original equation for which $\|{x}\|_{2}$ is smallest.

Conversely, if $A$ is full-rank in columns but rank-deficient in rows (the system is over constrained), the Moore-Penrose inverse of $A$ finds the least-squared-error approximate solution of the system of equations, i.e.,

$$x=A^{+}b$$

is the $x$ for which $\|Ax-b\|_{2}$ is smallest.

What happens if $A$ is rank-deficient in both rows and columns (e.g., there are more columns than rows, but fewer independent columns than rows)? Do the norm minimizations end up acting in sequence, so that

$$x=A^{+}b$$

minimizes $\|x\|_{2}$ over all solutions that minimize $\|Ax-b\|_{2}$, or is there some interaction between the norm minimizations so that they cannot be taken in sequence, and instead $A^{+}b$ minimizes some combined norm on the input and output spaces?

Best Answer

You are correct, in the fully rank-deficient case, $A^{\dagger}b$ is the minimum norm least-squared solution to the linear system $Ax = b$.

This can be seen by viewing $A^{\dagger}$ in terms of orthogonal complements of the range and null space. Let us conflate the $m \times n$ matrix $A$ with the linear transformation $A$ which maps $\mathbb{R}^n$ to $\mathbb{R}^m$. Then the pseudoinverse $A^{\dagger}$ maps $\mathbb{R}^m$ to $\mathbb{R}^n$ with the following property: if you decompose $\mathbb{R}^m = R(A) \oplus R(A)^{\perp}$, then $A^{\dagger}$ maps $R(A)$ to $\ker(A)^{\perp}$ and maps $R(A)^{\perp}$ to $0$.

Then we look at $Ax = b$. Then the set of least-squares solutions are all such $x$ such that $Ax = \mathrm{Proj}_{R(A)} b$, the nearest vector to $b$ contained in $R(A)$. However, since $A$ (as a matrix) is column-rank-deficient, it has a non-trivial kernel, and therefore $x$ is not unique and can be expressed by $x = x_p + x_n$, where $x_n \in \ker(A)$ and $x_p \in \ker(A)^{\perp}$ such that $Ax_p = \mathrm{Proj}_{R(A)} b$. Then $\|x\|^2 = \|x_p\|^2 + \|x_n\|^2$, so clearly $\|x\|$ is minimized when $x_n = 0$, and $A^{\dagger}$ is precisely that matrix which takes $b$ to $x_p$.

Related Question