Optimization – Orthogonal Projection onto the $L_1$ Unit Ball

convex optimizationconvex-analysishilbert-spacesoptimization

What is the Orthogonal Projection onto the $ {L}_{1} $ Unit Ball?

Namely, given $ x \in {\mathbb{R}}^{n} $ what would be:

$$ {\mathcal{P}}_{ { \left\| \cdot \right\| }_{1} \leq 1 } \left( x \right) = \arg \min_{{ \left\| y \right\| }_{1} \leq 1} \left\{ {\left\| y – x \right\|}_{2}^{2} \right\} $$

Thank You.

Best Answer

$$ \DeclareMathOperator{\sign}{sign} $$

The Lagrangian of the problem can be written as:

$$ \begin{align} L \left( x, \lambda \right) & = \frac{1}{2} {\left\| x - y \right\|}^{2} + \lambda \left( {\left\| x \right\|}_{1} - 1 \right) && \text{} \\ & = \sum_{i = 1}^{n} \left( \frac{1}{2} { \left( {x}_{i} - {y}_{i} \right) }^{2} + \lambda \left| {x}_{i} \right| \right) - \lambda && \text{Component wise form} \end{align} $$

The Dual Function is given by:

$$ \begin{align} g \left( \lambda \right) = \inf_{x} L \left( x, \lambda \right) \end{align} $$

The above can be solved component wise for the term $ \left( \frac{1}{2} { \left( {x}_{i} - {y}_{i} \right) }^{2} + \lambda \left| {x}_{i} \right| \right) $ which is solved by the soft Thresholding Operator:

$$ \begin{align} {x}_{i}^{\ast} = \sign \left( {y}_{i} \right) { \left( \left| {y}_{i} \right| - \lambda \right) }_{+} \end{align} $$

Where $ {\left( t \right)}_{+} = \max \left( t, 0 \right) $.

Now, all needed is to find the optimal $ \lambda \geq 0 $ which is given by the root of the objective function (Which is the constrain of the KKT Sytsem):

$$ \begin{align} h \left( \lambda \right) & = \sum_{i = 1}^{n} \left| {x}_{i}^{\ast} \left( \lambda \right) \right| - 1 \\ & = \sum_{i = 1}^{n} { \left( \left| {y}_{i} \right| - \lambda \right) }_{+} - 1 \end{align} $$

The above is a Piece Wise linear function of $ \lambda $ and its Derivative given by:

$$ \begin{align} \frac{\mathrm{d} }{\mathrm{d} \lambda} h \left( \lambda \right) & = \frac{\mathrm{d} }{\mathrm{d} \lambda} \sum_{i = 1}^{n} { \left( \left| {y}_{i} \right| - \lambda \right) }_{+} \\ & = \sum_{i = 1}^{n} -{ \mathbf{1} }_{\left\{ \left| {y}_{i} \right| - \lambda > 0 \right\}} \end{align} $$

Hence it can be solved using Newton Iteration.

In a similar manner the projection onto the Simplex (See @Ashkan answer) can be calculated.
The Lagrangian in that case is given by:

$$ \begin{align} L \left( x, \mu \right) & = \frac{1}{2} {\left\| x - y \right\|}^{2} + \mu \left( \boldsymbol{1}^{T} x - 1 \right) && \text{} \\ \end{align} $$

The trick is to leave non negativity constrain implicit.
Hence the Dual Function is given by:

$$ \begin{align} g \left( \mu \right) & = \inf_{x \succeq 0} L \left( x, \mu \right) && \text{} \\ & = \inf_{x \succeq 0} \sum_{i = 1}^{n} \left( \frac{1}{2} { \left( {x}_{i} - {y}_{i} \right) }^{2} + \mu {x}_{i} \right) - \mu && \text{Component wise form} \end{align} $$

Again, taking advantage of the Component Wise form the solution is given:

$$ \begin{align} {x}_{i}^{\ast} = { \left( {y}_{i} - \mu \right) }_{+} \end{align} $$

Where the solution includes the non negativity constrain by Projecting onto $ {\mathbb{R}}_{+} $

Again, the solution is given by finding the $ \mu $ which holds the constrain (Pay attention, since the above was equality constrain, $ \mu $ can have any value and it is not limited to non negativity as $ \lambda $ above).

The objective function (From the KKT) is given by:

$$ \begin{align} h \left( \mu \right) = \sum_{i = 1}^{n} {x}_{i}^{\ast} - 1 & = \sum_{i = 1}^{n} { \left( {y}_{i} - \mu \right) }_{+} - 1 \end{align} $$

The above is a Piece Wise linear function of $ \mu $ and its Derivative given by:

$$ \begin{align} \frac{\mathrm{d} }{\mathrm{d} \mu} h \left( \mu \right) & = \frac{\mathrm{d} }{\mathrm{d} \mu} \sum_{i = 1}^{n} { \left( {y}_{i} - \mu \right) }_{+} \\ & = \sum_{i = 1}^{n} -{ \mathbf{1} }_{\left\{ {y}_{i} - \mu > 0 \right\}} \end{align} $$

Hence it can be solved using Newton Iteration.

I wrote MATLAB code which implements them both at Mathematics StackExchange Question 2327504 - GitHub.
There is a test which compares the result to a reference calculated by CVX.

Related Question