From what I've observed in practice the SVD gives the best rank 1 approximation with respect to the Frobenius norm. But from what I've heard from others, it also minimizes the distance to the L2 matrix norm. I'm wondering doesSVD give the best rank 1 approximation with respect to the Frobenius norm, L2 norm, or both?
[Math] Does SVD give the best rank 1 approximation with respect to the Frobenius norm, L2 norm, or both
linear algebramatricesmatrix decompositionnormed-spacessvd
Related Solutions
The solution to the problem $$\min_{\tilde{A}} \quad \| A - \tilde{A} \| \quad \text{s.t.} \quad \text{rank}(\tilde{A}) \le r$$ is given by SVD as you say for any norm $\|\cdot\|$ that is unitarily invariant, i.e. invariant under unitary transformations. This is true of any norm that's a vector norm in the eigenvalues/singular values since unitary transformations do not affect the spectrum, e.g. Frobenius norm (2-norm in spectrum) or nuclear norm (1-norm in spectrum) or operator norm (infinity-norm in spectrum).
To my knowledge, there is no closed form for the entry-wise 1-norm. What is often done for these harder norms (most common case is the Frobenius norm on a limited support, i.e. minimize entry-wise squared distance from a partially observed matrix where the distance is only taken on the entries observed; this is the noisy low-rank matrix completion problem) is to optimize $$\min_{\tilde{A}} \quad \| A - \tilde{A} \|+\lambda\|\tilde A\|_*$$ where * is the nuclear norm. This is because the nuclear norm is convex envelope of rank on the operator norm ball. For the noisy matrix completion problem there are a bunch of guarantees on recovery of the minimal rank matrix in terms of restricted isometry conditions (google restricted isometry and matrix rank). This problem can be solved by a SDP and it is also exactly in the formulation appropriate for accelerated first order methods.
For an exact solution, you'd have to do some sort of SDP branch and bound or write the problem in terms of the low-rank factorization of $\tilde A$ and solve a potentially non-convex quadratic program perhaps by using iterative linearizations (fix the left factor then it's linear so easily solve, fix right factor, solve, repeat).
$\DeclareMathOperator{\tr}{tr}$ First note that $\tr(A^*B) =\langle A, B\rangle_F$, i.e. the trace of $A^*B$ is equal to the Frobenius inner product of the two matrices. Now let $A=U S V^*$ and $M = US_kV^*$ where $S$ is the diagonal matrix containing the singular values $\sigma_i$ and $S_k$ is the same with $\sigma_i$ replaced by $0$ for $i>k$. The important fact to remember now is that the $2$-norm (and $F$ norm) is invariant w.r.t. Orthogonal tranformations, i.e.
$$ \|A-M\|_F^2 = \|USV^* - US_k V^*\|_F^2 = \|U(S-S_k)V^*\|_F^2 = \|S-S_k\|_F^2 = \smash{\sum_{i=k+1}^n \sigma_i^2}$$
This can also be seen via the cyclic property of the trace:
$$\begin{aligned} \tr((A-M)^*(A-M) &= \tr\bigl( (USV^* - US_k V^*)^*(USV^* - US_k V^*)\bigr) \\ &=\tr\bigl( (U(S - S_k) V^*)^*(U(S-S_k) V^*)\bigr) \\ &=\tr\bigl( V (S-S_k) U^*U(S-S_k)V^*\bigr) \\ &=\tr\bigl( (S-S_k)^2V^*V\bigr) = \tr\bigl((S-S_k)^2\bigr) = \smash{\sum_{i=k+1}^n} \sigma_i^2\\ \end{aligned}$$
Best Answer
The SVD produces the best rank-1 approximation (or more generally, the best approximation of at most any fixed rank) with respect to both the Frobenius and $L_2$ norms. Proofs of these results are given on this wiki page.