Convex Analysis – Prove That the Nuclear Norm is Convex

convex-analysismatricesmatrix-normsnormed-spacesnuclear norm

For an $m \times n$ matrix, $A$, the nuclear norm of $A$ is defined as $\sum_{i}\sigma_{i}(A)$
where $\sigma_{i}(A)$ is the $i^{th}$ singular value of $A$. I've read that the nuclear norm is convex on the set of $m \times n$ matrices. I don't see how this true and can't find a proof online.

Best Answer

It is sufficient to prove that the nuclear norm is, in fact, a norm. It's trivial to verify that $\|A\|=0$ only if $A=0$, and that $\|tA\|=|t|\|A\|$ if $t$ is a scalar. The one non-trivial requirement is that the norm satisfies the triangle inequality; that is, $$\|A+B\|\leq \|A\|+\|B\|.$$ To do that, we're going to prove this: $$\sup_{\sigma_1(Q)\leq 1} \langle Q, A \rangle = \sup_{\sigma_1(Q)\leq 1} \mathop{\textrm{Tr}}(Q^HA) = \sum_i \sigma_i(A) = \|A\|.$$ Since $\sigma_1(\cdot)$ is itself a norm, what we're actually proving here is that the nuclear norm is dual to the spectral norm.

Let $A=U\Sigma V^H=\sum_i \sigma_i u_i v_i^H$ be the singular value decomposition of $A$, and define $Q=UV^H=UIV^H$. Then $\sigma_1(Q)=1$ by construction, and $$\langle Q, A \rangle = \langle UV^H, U\Sigma V^H \rangle = \mathop{\textrm{Tr}}(VU^HU\Sigma V^H) = \mathop{\textrm{Tr}}(V^HVU^HU\Sigma) = \mathop{\textrm{Tr}}(\Sigma) = \sum_i \sigma_i.$$ (Note our use of the identity $\mathop{\textrm{Tr}}(ABC)=\mathop{\textrm{Tr}}(CAB)$; this is always true when both multiplications are well-posed.) So we have established that $\sup_{\sigma_1(Q)\leq 1} \langle Q, A \rangle \geq \sum_i \sigma_i(A)$. Now let's prove the other direction: $$\sup_{\sigma_1(Q)\leq 1} \langle Q, A \rangle = \sup_{\sigma_1(Q)\leq 1} \mathop{\textrm{Tr}}(Q^HU\Sigma V^H) = \sup_{\sigma_1(Q)\leq 1} \mathop{\textrm{Tr}}(V^HQ^HU\Sigma) = \sup_{\sigma_1(Q)\leq 1} \langle U^HQV, \Sigma \rangle = \sup_{\sigma_1(Q)\leq 1} \sum_{i=1}^n \sigma_i (U^HQV)_{ii} = \sup_{\sigma_1(Q)\leq 1} \sum_{i=1}^n \sigma_i u_i^H Q v_i \leq \sup_{\sigma_1(Q)\leq 1} \sum_{i=1}^n \sigma_i \sigma_\max(Q) = \sum_{i=1}^n \sigma_i. $$ We have proven both the $\leq$ and $\geq$ cases, so equality is confirmed.

Why did we go through all of this trouble? To make proving the triangle inequality easy: $$\|A+B\|=\sup_{V:\sigma_\max(V)\leq 1} \langle V, A+B \rangle \leq \sup_{V:\sigma_\max(V)\leq 1} \langle V, A \rangle + \sup_{V:\sigma_\max(V)\leq 1} \langle V, B\rangle = \|A\| + \|B\|.$$

Related Question