The Sub Gradient and the Proximal Operator of the of $ {L}_{2, 1} $ Norm (Mixed Norm)

convex optimizationconvex-analysislinear algebramultivariable-calculusproximal-operators

What would be the the Sub Gradient of

$$ f \left( X \right) = {\left\| A X \right\|}_{2, 1} $$

Where $ X \in \mathbb{R}^{m \times n} $, $ {A} \in \mathbb{R}^{k \times m} $ and $ {\left\| Y \right\|}_{2, 1} = \sum_{j} \sqrt{ \sum_{i} {Y}_{i,j}^{2} } $.

What would be the Prox of:

$$ \operatorname{Prox}_{\lambda {\left\| \cdot \right\|}_{2,1}} \left( Y \right) = \arg \min_{X} \frac{1}{2} {\left\| X – Y \right\|}_{F}^{2} + \lambda {\left\| X \right\|}_{2, 1}, \; X, Y \in {\mathbb{R}}^{m \times n} $$

Can either be generalized for $ {\left\| \cdot \right\|}_{q, p} $?

Best Answer

In the following I will use MATLAB's notation using the : operator for selecting a column of a Matrix.

Sub Gradient of $ {L}_{2, 1} $ Mixed Norm

$$ f \left( X \right) = {\left\| A X \right\|}_{2, 1} = \sum_{i} {\left\| A {X}_{:, i} \right\|}_{2} $$

Now, for a vector $ x $ the gradient:

$$ \frac{\mathrm{d} {\left\| A x \right\|}_{2} }{\mathrm{d} x} = \frac{ {A}^{T} A x }{ {\left\| A x \right\|}_{2} } $$

Which implies:

$$\begin{align*} \frac{\mathrm{d} {\left\| A X \right\|}_{2, 1} }{\mathrm{d} X} & = \frac{\mathrm{d} \sum_{i} {\left\| A {X}_{:, i} \right\|}_{2} }{\mathrm{d} X} && \text{} \\ & = \frac{\mathrm{d} \sum_{i} {\left\| A {X}_{:, i} \right\|}_{2} }{\mathrm{d} {X}_{:, i}} \boldsymbol{e}_{i}^{T} && \text{Where $ \boldsymbol{e}_{i} $ is the standard $ i $ -th basis vector} \\ & = \sum_{i} \frac{\mathrm{d} {\left\| A {X}_{:, i} \right\|}_{2} }{\mathrm{d} {X}_{:, i}} \boldsymbol{e}_{i}^{T} && \text{} \\ & = \sum_{i} \frac{ {A}^{T} A {X}_{:, i} }{ {\left\| A {X}_{:, i} \right\|}_{2} } \boldsymbol{e}_{i}^{T} && \text{} \\ & = {A}^{T} A X D \end{align*}$$

Where

$$ D = \operatorname{diag} \left\{ {d}_{1}, {d}_{2}, \ldots, {d}_{n} \right\}, \; {d}_{i} = \begin{cases} 0 & \text{ if } {\left\| A {X}_{:, i} \right\|}_{2} = 0 \\ \frac{1}{{\left\| A {X}_{:, i} \right\|}_{2}} & \text{ if } {\left\| A {X}_{:, i} \right\|}_{2} \neq 0 \end{cases} $$

Remark
For a column of zero sin $ X $ the Sub Gradient of the $ {L}_{2} $ norm of that vector is any vector with $ {L}_{2} $ norm which less or equal to unit. In the case above it was chosen to be the zero vector which indeed has a norm less than 1.

Prox of $ {L}_{2, 1} $ Mixed Norm

The problem is given by:

$$ \arg \min_{X} \frac{1}{2} {\left\| X - Y \right\|}_{F}^{2} + \lambda {\left\| X \right\|}_{2, 1} $$

Where $ X, Y \in \mathbb{R}^{m \times n} $.

Again, this can be decomposed into working on each column of $ X $ separately:

$$\begin{aligned} \arg \min_{X} \frac{1}{2} {\left\| X - Y \right\|}_{F}^{2} + \lambda {\left\| X \right\|}_{2, 1} & = \arg \min_{X} \sum_{i} \frac{1}{2} {\left\| {X}_{:, i} - {Y}_{:, i} \right\|}_{2}^{2} + \sum_{i} \lambda {\left\| {X}_{:, i} \right\|}_{2}^{2} && \text{} \\ & = \arg \min_{X} \left( \frac{1}{2} {\left\| {X}_{:, 1} - {Y}_{:, 1} \right\|}_{2}^{2} + \lambda {\left\| {X}_{:, 1} \right\|}_{2}^{2} \right) && \\ & + \left( \frac{1}{2} {\left\| {X}_{:, 2} - {Y}_{:, 2} \right\|}_{2}^{2} + \lambda {\left\| {X}_{:, 2} \right\|}_{2}^{2} \right) && \\ & + \cdots && \\ & + \left( \frac{1}{2} {\left\| {X}_{:, n} - {Y}_{:, n} \right\|}_{2}^{2} + \lambda {\left\| {X}_{:, n} \right\|}_{2}^{2} \right) \end{aligned}$$

Each term in the brackets is independent Prox Function of $ {L}_{2} $ Norm.
Hence the solution is given by:

$$ \hat{X} = \arg \min_{X} \frac{1}{2} {\left\| X - Y \right\|}_{F}^{2} + \lambda {\left\| X \right\|}_{2, 1} $$

Where $ \hat{X}_{:, i} = {Y}_{:, i} \left( 1 - \frac{\lambda}{\max \left( {\left\| {Y}_{:, i} \right\|}_{2} , \lambda \right)} \right) $

MATLAB Code

I implemented a code to verify the results versus Numerical Derivative (Finite Differences) and CVX (Reference for the Prox).
The full code is in my StackExchange Mathematics Q3307741 GitHub Repository.

Best Answer

Sub Gradient of $ {L}_{2, 1} $ Mixed Norm

Prox of $ {L}_{2, 1} $ Mixed Norm

MATLAB Code

Related Solutions

[Math] The Proximal Operator of the $ {L}_{\infty} $ (Infinity Norm)

Optimization – Proximal Operator of Weighted L2 Norm

Related Question