[Math] a good technique to decide step size in sub-gradient method for dual decomposition

convex optimizationoptimization

I am looking at the following paper to implement dual decomposition for my algorithm:
http://www.csd.uoc.gr/~komod/publications/docs/DualDecomposition_PAMI.pdf

On Pg.29 they suggest setting the step size for the sub-gradient method by taking the difference of the best primal solution and current dual solution and dividing by the L2-norm of the sub-gradient at current iteration.

My doubt is the following: Do I use sub-gradients for each slave problem and compute a different step-size for each slave problem? Or is there some way I can compute the sub-gradient for the combined dual problem?

Best Answer

Step-sizes are the crucial and difficult point when using subgradient methods. Basically you need that the step-sizes tend to zero but not too fast. If one uses a-priori step-sizes (e.g. of the form $1/k$) then the method provably converges but in practice it will slow down such that you'll not observe convergence.

The dynamic rule they suggest in the paper (in equation (40)) look like so-called Polyak step-sizes with estimation of the true optimal values (obtained by values of the dual problem). One can prove convergence with these step-sizes under special conditions. I do not know a good reference off the top of my head but many books on (nonsmooth) convex optimization should treat this.

Related Solutions

Optimization – Do Primal-Dual Methods Need to Start with a Feasible Point?

Make no mistake, the book Convex Optimization is a fantastic resource for learning about convex optimization. I am very biased; Prof. Boyd was my Ph.D. advisor, and we still work together. I also know Prof. Vandenberghe personally and respect him greatly as well. But I've also taught courses in convex optimization using the book. I stand by my opinion!

That said, that book does not attempt to teach you how to build a state-of-the-art convex optimization algorithm. Instead, it's just giving you the basics, and in the process it makes some common simplifying assumptions, like strict primal and dual feasibility. Whereas the book teaches you about barrier methods, most state-of-the-art algorithms today employ what is called a symmetric primal dual method. There are very good reasons to do this, including the ability to start from an infeasible point, and even the ability to handle problems that are not strictly feasible in the first place. And yet, the computations involved in barrier methods are actually extremely similar to the ones employed in symmetric primal-dual methods. So the book does a reasonably good job as an introduction to the way things are actually done.

You are not going to get an answer here on Math.SE on how to implement your own custom solver that can handle infeasible starting points. That's simply beyond the scope of the forum. I recommend Google searches for infeasible interior-point methods, symmetric primal dual methods, and homogeneous self-dual embedding. This last term refers to a technique that allows some of the best solvers out there (e.g., MOSEK, Gurobi [i think], ECOS, CVXOPT, SeDuMi, SDPT3) to handle infeasible and unbounded problems. Be prepared for some thick reading material.

Perhaps you should reconsider whether or not it is wise to even try to implement your own algorithm. There are a lot of good convex optimization engines out there, why would you try and reinvent the wheel? Yes, I know you want to do sequential convex optimization. But that just means you have a lot of things to worry about besides whether or not your internal convex optimization loop is as fast as it could be. Get something working first before you spend time reinventing the wheel.

If you must build your own solver: again, do what ever works first. If the numerical results you get out of it are good, then think about what it takes to speed things up; at least you know the effort will be worth it. If the numerical results don't look good, then its speed is irrelevant---unless you like getting bad results faster.

[Math] Solving Optimization Problem (Orthogonal Projection) Using Projected Sub Gradient / Dual Projected Subgradient

This is a Community Wiki Solution.
Feel free to edit and add.
I will point up and mark solution for any other solution made by the community.

KKT

The Lagrangian is given by:

$$ L \left( y, \lambda \right) = \frac{1}{2} {\left\| y - x \right\|}^{2} + \mu \left( {e}^{T} y - k \right) - {\lambda}_{1}^{T} y + {\lambda}_{2}^{T} \left( y - e \right) $$

The KKT Constraints:

\begin{align} \left( 1 \right) \; & {\nabla}_{y} L \left( y, \lambda \right) = y - x + \mu e - {\lambda}_{1} + {\lambda}_{2} & = 0 \\ \left( 2 \right) \; & {\nabla}_{\mu} L \left( y, \lambda \right) = {e}^{T} y - k & = 0 \\ \left( 3 \right) \; & -{\lambda}_{1}^{T} y & = 0 \\ \left( 4 \right) \; & {\lambda}_{2}^{T} \left( y - e \right) & = 0 \\ \left( 5 \right) \; & {\lambda}_{1} & \geq 0 \\ \left( 6 \right) \; & {\lambda}_{2} & \geq 0 \end{align}

Multiplying $ \left( 1 \right) $ by $ {e}^{T} $ and using $ \left( 2 \right) $ yields:

$$ {e}^{T} y - {e}^{T} x + \mu {e}^{T} e +{e}^{T} {\lambda}_{2} \Rightarrow \mu = \frac{ {e}^{T} x - k }{ n - {e}^{T} {\lambda}_{1} + {e}^{T} {\lambda}_{2} } $$

Plugging the result into $ \left( 1 \right) $ yields

Seems to be hard to get analytic solution.
Any other way to solve this system of equations?

Under Work...
Feel free to continue.

Projected Subgradient

Dual Projected Sub Gradient

Given a Problem in the form:

\begin{align*} \arg \min_{x} & \quad f \left( x \right) \\ s.t. & \quad {g}_{i} \left( x \right) \leq 0 , \; i = 1, 2, \cdots, m \\ & \quad x \in \mathcal{S} \end{align*}

Where

$ \mathcal{S} \in \mathbb{E} $ is Convex.
$ f : \mathbb{E} \rightarrow \mathbb{R} $ is convex.
$ {g}_{i} : \mathbb{E} \rightarrow \mathbb{R} $ is Convex.

Then, from Amir Beck's Lecture Notes, The Dual Projected Sub Gradient is given by:

Pick $ {\lambda}^{0} \in {\mathbb{R}}_{+} $.
For $ k = 0, 1, 2, \cdots $ Calculate:
- $ {x}_{k} = \arg \min_{x \in \mathcal{S}} \left\{ f \left( x \right) + \sum_{i = 1}^{m} {\lambda}_{i}^{k} {g}_{i} \left( x \right) \right\} $.
- $ {\lambda}_{i}^{k + 1} = {\left[ {\lambda}_{i}^{k} + {t}_{k} \frac{ {g}_{i} \left( {x}^{k} \right) }{ {\left\| {g}_{i} \left( {x}^{k} \right) \right\|}_{2} } \right]}_{+} $.

In this problem $ f \left( y \right) = \frac{1}{2} { \left\| y - x \right\| }^{2} $, $ i = 1, 2, ..., n, \; {g}_{i} \left( y \right) = -{y}_{i} $, $ i = n + 1, n + 2, ..., 2n, \; {g}_{i} \left( y \right) = {y}_{i} - 1 $ and $ \mathcal{S} = \left\{ x \mid {e}^{T} x = k \right\} $.

The Sub Problem $ {x}_{k} = \arg \min_{x \in \mathcal{S}} \left\{ f \left( x \right) + \sum_{i = 1}^{m} {\lambda}_{i}^{k} {g}_{i} \left( x \right) \right\} $ should be solved using Projected Sub Gradient.

In this case the Projection Operator is given by $ {\mathcal{P}}_{{e}^{T} x = k} \left( x \right) = x - e {\left( {e}^{T} e \right)}^{-1} \left( {e}^{T} x - b \right) $.
The Gradient of $ L \left( y, \lambda \right) = \frac{1}{2} { \left\| y - x \right\| }^{2} + \sum_{i = 1}^{m} {\lambda}_{i} {g}_{i} \left( x \right) $ is given by $ {\nabla}_{y} L \left( y, \lambda \right) = y - x + {\left[ \left( {\lambda}_{n + 1} - {\lambda}_{1} \right), \left( {\lambda}_{n + 2} - {\lambda}_{2} \right), \cdots, \left( {\lambda}_{2n} - {\lambda}_{n} \right) \right]}^{T} $.

This is a MATLAB code which implements the method:

function [ vY ] = ProjX( vX, numElmntsK )
%UNTITLED6 Summary of this function goes here
%   Detailed explanation goes here

numDim  = size(vX, 1);
vE      = ones([numDim, 1]);

hProjFunc   = @(vY) vY - (vE * ((vE.' * vE) \ ((vE.' * vY) - numElmntsK)));
hGFunc      = @(vY) [-vY; (vY - vE)];
hGradLFun   = @(vY, vLambda) vY - vX - vLambda(1:numDim) + vLambda((numDim + 1):(2 * numDim));

% numIterDual = 6000;
numIterProj = 150;

vY          = vE;
vY          = numElmntsK * (vY / sum(vY));
vLambda     = 0.5 * ones([(2 * numDim), 1]);
stepSize    = 0.0075;

minConstVal = inf;

minConstThr = 0.00005;

while(minConstVal > minConstThr)
    % Dual Projected Sub Gradient
    for jj = 1:numIterProj
        % Projetcted Sub Gradient
        vY = hProjFunc(vY - (stepSize * hGradLFun(vY, vLambda)));
    end
    vLambda = vLambda + (stepSize * hGFunc(vY) / norm(hGFunc(vY), 2));
    vLambda = max(vLambda, 0);

    minConstVal = abs(sum(vLambda .* hGFunc(vY)));
end


end