[Math] Can Moore–Penrose pseudoinverse solve for underdetermined linear system

derivativeslinear algebraoptimization

Thanks for reading my thread.

I am thinking, many of us know that Moore–Penrose pseudoinverse can solve for overdetermined system
$Ax=b$, where $x=(A^TA)^{-1}A^Tb$; for exmplae the linear regression application , or curve fitting applications.

However, I am wondering for underdetermined system, can we use Moore–Penrose pseudoinverse solver? If yes, why we need many iterative reconstruction algorithm? Since we can know the derivative of the objective anyway, then why don't we just set the derivative to 0, then solve it using some skills like Moore–Penrose pseudoinverse?

Some explanation in theory is highly appreciated. Does not have to be rigorous prove, but something that makes sense. Thanks a lot!

Best Answer

If $A^TA$ is not positive definite (or it's not invertible, or equivalently some of its eigen values are zero), then we can either use the Moore–Penrose inverse, or we can use Tikhonov regularization.

Tikhonov regularization calculates the regular inverse of $(A^TA+\epsilon I)$ for some small $\epsilon$, and uses this inverse instead.

Moore–Penrose inverse is basically based on the SVD decomposition of $A^TA$ say $V\Lambda V^T$, where $V$ is a unitary matrix and $\Lambda$ is a diagonal matrix. Then $(A^TA)^{-1}=V\Lambda^{-1} V^T$, if an element $\lambda_i$ (a singular value) on the diagonal of $\Lambda$ is not zero then we replace it with $1/\lambda_i$ in $\Lambda^{-1}$, but if it's zero, we set the corresponding element to zero in $\Lambda^{-1}$.

Now we can answer your question, about why people use iterative solutions. There can be several reasons, one is when you have other constraints on the variable $x$, then $(A^TA)^{-1}A^Tb$ is not the solution. But the main reason is that for large scale $A$, calculating the SVD decomposition for example can be expensive, and at the same time it maybe not memory efficient. Some times you don't have access to full $A$, and can only observe a sparse representation of it, and so on.

Related Question