Unconstrained matrix factorization vs SVD

SVD decomposes a matrix $M$ as the product of two orthogonal matrices $U$ and $V$ (plus a diagonal matrix of singular values $\Sigma$).

$ M = U \Sigma V^*$

In this case, $U$ and $V$ are truncated to the first $n$ singular vectors for compressing $M$.

I'm exploring an alternative unconstrained factorization of $M$ using gradient descent to find $U$ and $V$ with the same dimensions as before:

$ \underset{UV}{\mathrm{argmin}} \: \mathbf{MSE}(M-UV) $

My results indicate that both methods for decomposing $M$ achieve the same mean square error ($\mathbf{MSE}$) score up to a surprising level of accuracy. Obviously, the resulting factorisations are different in each case, where the $U$ and $V$ found with the gradient descent method are not orthogonal.

My questions are:

Given that the second method has no constraints, I'd have supposed it should be able to find better factorisations than SVD — in a $\mathbf{MSE}$ sense. Is this assumption right?
Is there any property of $M$, such as its rank or dimensions, that can be extracted from this result? The orthogonality constraint is apparently not limiting the SVD factorisation but I'm not sure if this happens in general or just for certain types of $M$.

Any thoughts about this or links to relevant literature would be greatly appreciated.

Best Answer

Your first question is not technically a question, but your supposition is incorrect. No, it is not true that you can find a factorization that comes closer in the mean square error sense (I assume that's what MSE stands for). The EYM theorem guarantees that the approximation attained by truncating the SVD is optimal.

Your second question is indeed a question, but I don't know what you mean by "extracting property of $M$ from this result". In particular, I don't know what you mean by "extracting", "a property of $M$", or "this result".

Best Answer

Related Solutions

[Math] Relation between eigenvectors of covariance matrix and right Singular vectors of SVD, Diagonal matrix

[Math] Why do we say SVD can handle singular matrix in least-squares? Comparison of SVD and QR decompositions

Related Question