Suppose I have an overdetermined system of linear equations, $Ax=b$, where $A \in \mathbb{R}^{m \times n}$, $m > n$. In general, there is no exact solution, and we know that if $A$ has independent columns, then the best-fit or least-squares solution is given by the Moore-Penrose inverse of $A$ multiplied with $b$:
$x_{LS} = A^{+}b = [A^TA]^{-1}A^Tb$.
This can be obtained by writing out the least-squares optimisation problem and then solving it by setting the gradient of the objective with respect to $x$ to $0$. However, consider the following simpler derivation:
$Ax=b$
$[A^TA]^{-1}A^TAx=[A^TA]^{-1}A^Tb$ (left-multiplying both sides by the Moore-Penrose inverse)
$[A^TA]^{-1}[A^TA]x=[A^TA]^{-1}A^Tb$
$x=[A^TA]^{-1}A^Tb$
So this appears to suggest that the least-squares solution is an exact solution. There must be something wrong with this derivation, but I can't place my finger on it. What is wrong with it? And how come it ends up giving the least-squares solution, even though I wasn't solving for that?
Best Answer
If the matrix $A$ has full column rank (that is, its columns are linearly independent), then a system $Ax=b$ has at most one solution.
In your derivation, you are assuming that a solution exists and, of course, if a solution exists, it is also the least squares solution.
So what you proved is if $x$ is a solution of $Ax=b$, where $A$ has full column rank, then $x=A^+b$.