Reason for $\log$ transformation in geometric programming

convex optimizationgeometric-programmingnonlinear optimizationoptimization

I have read about the transformation of geometric programs to convex programs. I have read it both in Boyd's book and his geometric programming tutorial. I understand that the transformation indeed produces a convex program. However, I do not understand why we apply logarithm to the objective function and the functions in the inequalities.

Instead of minimizing the objective $f_0$, we minimize its logarithm $\log f_0$. We replace the inequality constraints $f_i \leq 1$ with $\log f_i \leq 0$, and the equality constraints $g_i=1$ with $\log g_i=0$.

I understand that this is necessary in the functions in equality constraints because there, we need affine functions. However, the other functions need be only convex and these functions are convex before the application of the $\log $ (for one cannot turn nonconvex function into convex one by applying increasing concave function).

What is the reason for the $\log$'s then? Is it just to get better numerical properties?

Best Answer

A GP in standard form contains posynomial functions: $$f(x) = \sum_i \prod_j x_{ij}^{a_{ij}} = \sum_i \exp\left( \sum_j a_{ij} \log x_{ij}\right).$$ If you substitute $y_{ij} = \log x_{ij}$, you get $$f(y) = \sum_i \exp\left( \sum_j a_{ij} y_{ij}\right).$$ This is indeed convex. Your question is why we take the logarithm of $f$: $$g(y) = \log f(y) = \log \sum_i \exp\left( \sum_j a_{ij} y_{ij}\right).$$

There is just one reason. If you solve the primal GP (as in Boyd&Vandenberghe, p. 573), the problem is well scaled. If you scale all $y_{ij}$ with a factor 2, then $g$ also scales with a factor 2. The derivative and Hessian (which are relevant in many optimization algorithms) are also well scaled. In the original problem, you will run into numerical issues when the argument of the exponential function is 1 in one term, and 50 in another term. Solving the primal problem has a major drawback: taking the logarithm of the function values can be detrimental for the sparsity pattern of the Hessian.

However, there is also a good reason to not apply the transformation, and that is for deriving the dual problem. This dual -which cannot be derived from a GP in convex form - has a 2-self-concordant logarithmic barrier and has good sparsity properties. Self-concordancy is mostly a theoretical result and shows that a GP can be solved in polynomial time, but sparsity is a useful for solving large instances.