Solved – Why are optimization algorithms defined in terms of other optimization problems

machine learningoptimizationsvm

I am doing some research on optimization techniques for machine learning, but I am surprised to find large numbers of optimization algorithms are defined in terms of other optimization problems. I illustrate some examples in the following.

For example https://arxiv.org/pdf/1511.05133v1.pdf

enter image description here

Everything looks nice and good but then there is this $\text{argmin}_x$ in the $z^{k+1}$ update….so what is the algorithm that solves for the $\text{argmin}$? We don't know, and it doesn't say. So magically we are to solve another optimization problem which is find the minimizing vector so that the inner product is at minimum – how can this be done?

Take another example:https://arxiv.org/pdf/1609.05713v1.pdf

enter image description here

Everything looks nice and good until you hit that proximal operator in the middle of the algorithm, and what is the definition of that operator?

Boom:enter image description here

Now pray tell, how do we solve this $\text{argmin}_x$ in the proximal operator? It doesn't say. In any case, that optimization problem looks hard (NP HARD) depending on what $f$ is.

Can someone please enlighten me as to:

  1. Why are so many optimization algorithms defined in terms of other optimization problems?

(Wouldn't this be some sort of chicken and egg problem: to solve problem 1, you need to solve problem 2, using method of solving problem 3, which relies on solving problem ….)

  1. How do you solve these optimization problems that are embedded in these algorithms? For example, $x^{k+1} = \text{argmin}_x \text{really complicated loss function}$, how to find the minimizer on the right hand side?

  2. Ultimately, I am puzzled as to how these algorithms can be numerically implemented. I recognize that adding and multiplying vectors are easy operations in python, but what about $\text{argmin}_x$, is there some function (script) that magically gives you the minimizer to a function?

(Bounty: can anyone reference a paper for which the authors make clear the algorithm for the sub-problem embedded in the high level optimization algorithm?)

Best Answer

You are looking at top level algorithm flow charts. Some of the individual steps in the flow chart may merit their own detailed flow charts. However, in published papers having an emphasis on brevity, many details are often omitted. Details for standard inner optimization problems, which are considered to be "old hat" may not be provided at all.

The general idea is that optimization algorithms may require the solution of a series of generally easier optimization problems. It's not uncommon to have 3 or even 4 levels of optimization algorithms within a top level algorithm, although some of them are internal to standard optimizers.

Even deciding when to terminate an algorithm (at one of the hierarchial levels) may require solving a side optimization problem. For instance, a non-negatively constrained linear least squares problem might be solved to determine the Lagrange multipliers used to evaluate the KKT optimality score used to decide when to declare optimality.

If the optimization problem is stochastic or dynamic, there may be yet additional hierarchial levels of optimization.

Here's an example. Sequential Quadratic Programming (SQP). An initial optimization problem is treated by iteratively solving the Karush-Kuhn-Tucker optimality conditions, starting from an initial point with an objective which is a quadratic approximation of the Lagrangian of the problem, and a linearization of the constraints. The resulting Quadratic Program (QP) is solved. The QP which was solved either has trust region constraints, or a line search is conducted from the current iterate to the solution of the QP, which is itself an optimization problem, in order to find the next iterate. If a Quasi-Newton method is being used, an optimization problem has to be solved to determine the Quasi-Newton update to the Hessian of the Lagrangian - usually this is a closed form optimization using closed form formulas such as BFGS or SR1, but it could be a numerical optimization. Then the new QP is solved, etc. If the QP is ever infeasible, including to start, an optimization problem is solved to find a feasible point. Meanwhile, there may be one or two levels of internal optimization problems being called inside the QP solver. At the end of each iteration, a non-negative linear least squares problem might be solved to determine the optimality score. Etc.

If this is a mixed integer problem, then this whole shebang might be performed at each branching node, as part of a higher level algorithm. Similarly for a global optimizer - a local optimization problem is used to produce an upper bound on the globally optimal solution, then a relaxation of some constraints is done to produce a lower bound optimization problem. Thousands or even millions of "easy" optimization problems from branch and bound might be solved in order to solve one mixed integer or global optimization problem.

This should start to give you an idea.

Edit: In response to the chicken and egg question which was added to the question after my answer: If there's a chicken and egg problem, then it's not a well-defined practical algorithm. In the examples I gave, there is no chicken and egg. Higher level algorithm steps invoke optimization solvers, which are either defined or already exist. SQP iteratively invokes a QP solver to solve sub-problems, but the QP solver solves an easier problem, QP, than the original problem. If there is an even higher level global optimization algorithm, it may invoke an SQP solver to solve local nonlinear optimization subproblems, and the SQP solver in turn calls a QP solver to solve QP subproblems. No chiicken and egg.

Note: Optimization opportunities are "everywhere". Optimization experts, such as those developing optimization algorithms, are more likely to see these optimization opportunities, and view them as such, than the average Joe or Jane. And being algorithmically inclined, quite naturally they see opportunities for building up optimization algorithms out of lower-level optimization algorithms. Formulation and solution of optimization problems serve as building blocks for other (higher level) optimization algorithms.

Edit 2: In response to bounty request which was just added by the OP. The paper describing the SQP nonlinear optimizer SNOPT https://web.stanford.edu/group/SOL/reports/snopt.pdf specifically mentions the QP solver SQOPT, which is separately documented, as being used to solve QP subproblems in SNOPT.

Related Question