[Math] The Proximal Operator of the Nuclear Norm / Schatten Norm

convex optimizationnuclear normoptimizationproximal-operatorsregularization

\begin{equation}
\arg\min_{X} \frac{1}{2}\|X-Y\|_{F}^2 + \tau\|X\|_{*}
\end{equation}
where $\tau\geq 0,Y\in \mathbb{C}^{n\times n}$ and $\|\cdot\|_{*}$ is the nuclear norm. What's the solution of this convex optimization?

In some literature, they show the solution of this optimization problem in real condition (where $Y\in \mathbb{R}^{n\times n}$) is $\mathcal{D}_{\tau}(Y)$, where $\mathcal{D}_{\tau}$ is the soft-thresholding operator. But I wonder what the solution is in complex condition (where $Y\in \mathbb{C}^{n\times n}$)? Is it exactly the same? which is $\mathcal{D}_{\tau}(Y)$.

Best Answer

Basically, for any Schatten Norm the algorithm is pretty simple.

If we use Capital Letter $ A $ for Matrix and Small Letter for Vector than:

$$ {\operatorname*{Prox}}_{\lambda \left\| \cdot \right\|_{p}} \left( A \right) = \arg \min_{X} \frac{1}{2} \left\| X - A \right\|_{F}^{2} + \lambda \left\| X \right\|_{p} $$

Where $ \left\| X \right\|_{p} $ is the $ p $ Schatten Norm of $ X $.

Defining $ \boldsymbol{\sigma} \left( X \right) $ as a vector of the Singular Values of $ X $ (See the Singular Values Decomposition).

Then the Proximal Operator Calculation is as following:

  1. Apply the SVD on $ A $: $ A \rightarrow U \operatorname*{diag} \left( \boldsymbol{\sigma} \left( A \right) \right) {V}^{T} $.
  2. Extract the vector of Singular Values $ \boldsymbol{\sigma} \left( A \right) $.
  3. Calculate the Proximal Operator of the extracted vector using Vector Norm $ p $: $ \hat{\boldsymbol{\sigma}} \left( A \right) = {\operatorname*{Prox}}_{\lambda \left\| \cdot \right\|_{p}} \left( \boldsymbol{\sigma} \left( A \right) \right) = \arg \min_{x} \frac{1}{2} \left\| x - \boldsymbol{\sigma} \left( A \right) \right\|_{2}^{2} + \lambda \left\| x \right\|_{p} $.
  4. Return the Proximal of the Matrix Norm: $ \hat{A} = {\operatorname*{Prox}}_{\lambda \left\| \cdot \right\|_{p}} \left( A \right) = U \operatorname*{diag} \left( \hat{\boldsymbol{\sigma}} \left( A \right) \right) {V}^{T} $.

The mapping of Matrix Norm into Schatten Norm:

  • Frobenius Norm - Given by $ p = 2 $ in Schatten Norm.
  • Nuclear Norm - Given by $ p = 1 $ in Schatten Norm.
  • Spectral Norm (The $ {L}_{2} $ Induced Norm of a Matrix) - Given by $ p = \infty $ in Schatten Norm.

So in your case use the Schatten Norm where $ p = 1 $.
The Proximal Operator for Vector Norm for $ {L}_{1} $ Norm is the Soft Thresholding Operator.