Why does the Moore-Penrose inverse appear to give an exact solution for overdetermined linear systems

linear algebra

Suppose I have an overdetermined system of linear equations, $Ax=b$, where $A \in \mathbb{R}^{m \times n}$, $m > n$. In general, there is no exact solution, and we know that if $A$ has independent columns, then the best-fit or least-squares solution is given by the Moore-Penrose inverse of $A$ multiplied with $b$:

$x_{LS} = A^{+}b = [A^TA]^{-1}A^Tb$.

This can be obtained by writing out the least-squares optimisation problem and then solving it by setting the gradient of the objective with respect to $x$ to $0$. However, consider the following simpler derivation:

$Ax=b$

$[A^TA]^{-1}A^TAx=[A^TA]^{-1}A^Tb$ (left-multiplying both sides by the Moore-Penrose inverse)

$[A^TA]^{-1}[A^TA]x=[A^TA]^{-1}A^Tb$

$x=[A^TA]^{-1}A^Tb$

So this appears to suggest that the least-squares solution is an exact solution. There must be something wrong with this derivation, but I can't place my finger on it. What is wrong with it? And how come it ends up giving the least-squares solution, even though I wasn't solving for that?

Best Answer

If the matrix $A$ has full column rank (that is, its columns are linearly independent), then a system $Ax=b$ has at most one solution.

In your derivation, you are assuming that a solution exists and, of course, if a solution exists, it is also the least squares solution.

So what you proved is if $x$ is a solution of $Ax=b$, where $A$ has full column rank, then $x=A^+b$.