Optimization – Orthogonal Projection onto the Unit Simplex

convex optimizationlinear algebraMATLABoptimization

The Unit Simplex is defined by:

$$ \mathcal{S} = \left\{ x \in \mathbb{{R}^{n}} \mid x \succeq 0, \, \boldsymbol{1}^{T} x = 1 \right\} $$

Orthogonal Projection onto the Unit Simplex is defined by:

$$
\begin{alignat*}{3}
\arg \min_{x} & \quad & \frac{1}{2} \left\| x – y \right\|_{2}^{2} \\
\text{subject to} & \quad & x \succeq 0 \\
& \quad & \boldsymbol{1}^{T} x = 1
\end{alignat*}
$$

How could one solve this convex optimization problem?

Best Answer

Projection onto the Simplex can be calculated as following.
The Lagrangian in that case is given by:

$$ L \left( x, \mu \right) = \frac{1}{2} {\left\| x - y \right\|}^{2} + \mu \left( \boldsymbol{1}^{T} x - 1 \right) \tag1 $$

The trick is to leave non negativity constrain implicit.
Hence the Dual Function is given by:

\begin{align} g \left( \mu \right) & = \inf_{x \succeq 0} L \left( x, \mu \right) && \text{} \\ & = \inf_{x \succeq 0} \sum_{i = 1}^{n} \left( \frac{1}{2} { \left( {x}_{i} - {y}_{i} \right) }^{2} + \mu {x}_{i} \right) - \mu && \text{Component wise form} \tag2 \end{align}

Taking advantage of the Component Wise form the solution is given:

\begin{align} x_i^\ast = \left(y_i-\mu\right)_+ \tag3 \end{align}

Where the solution includes the non negativity constrain by Projecting onto $ {\mathbb{R}}_{+} $

The solution is given by finding the $ \mu $ which holds the constrain (Pay attention, since the above was equality constrain, $ \mu $ can have any value and it is not limited to non negativity as $ \lambda $).

The objective function (From the KKT) is given by:

\begin{align} 0 = h \left( \mu \right) = \sum_{i = 1}^{n} {x}_{i}^{\ast} - 1 & = \sum_{i = 1}^{n} { \left( {y}_{i} - \mu \right) }_{+} - 1 \tag4 \end{align}

The above is a Piece Wise linear function of $ \mu $.

Since the function is continuous yet it is not differentiable due to its piece wise property theory says we must use derivative free methods for root finding. One could use the Bisection Method for instance.

The function Derivative given by:

$$ \begin{align} \frac{\mathrm{d} }{\mathrm{d} \mu} h \left( \mu \right) & = \frac{\mathrm{d} }{\mathrm{d} \mu} \sum_{i = 1}^{n} { \left( {y}_{i} - \mu \right) }_{+} \\ & = \sum_{i = 1}^{n} -{ \mathbf{1} }_{\left\{ {y}_{i} - \mu > 0 \right\}} \end{align} $$

In practice, it can be solved using Newton Iteration (Since falling into a joint between 2 sections has almost zero probability).

Accurate / Exact Solution

If we look at the values of the function $ h \left( \mu \right) = \sum_{i = 1}^{n} { \left( {y}_{i} - \mu \right) }_{+} - 1 $ one could easily infer a method to calculate the accurate solution:

enter image description here

In the above the parameter $ \mu $ took the values of the vector $ {y}_{i} $ with additional values at the edges (Value greater than the max value of $ {y}_{i} $ and value lower of the min value of $ {y}_{i} $).
By iterating the values one could easily track the 2 values which on each side they have value greater than $ 0 $ and lower then $ 0 $ (In case one of them is zero, then it is the optimal value of $ \mu $). Since it is linear function and we have 2 points we can infer all parameters of the model $ y = a x + b $. Than the optimal value of $ \hat{\mu} = - \frac{b}{a} $.

I wrote MATLAB code which implements the method with Newton Iteration at Mathematics StackExchange Question 2327504 - GitHub. I extended the method for the case $ \sum {x}_{i} = r, \; r > 0 $ (Pseudo Radius).
There is a test which compares the result to a reference calculated by CVX.