Minimize or Maximize Lagrangian

lagrange multipliernon-convex-optimizationnonlinear optimizationoptimization

Given the primal problem
$$\begin{aligned}
\min_{x} \quad & f(x)\\
\textrm{s.t.} \quad & l_i(x) \leq 0, i\in \{1,\dots,m\}\\
& h_i(x) = 0, i\in \{1,\dots,p\} \\
\end{aligned}\quad(\text{P1})$$
we define the Lagrangian as

$$L(x,\alpha,\lambda) = f(x) + \sum_{i=1}^{m}\alpha_il_i(x) + \sum_{j=1}^{p}\lambda_jh_j(x)\quad (\text{LF})$$ and the dual function as

$$g(\alpha, \lambda) = \min\limits_{x}L(x,\alpha,\lambda) \quad (\text{P2})$$ The associated dual problem will be defined as
$$\begin{aligned}
\max_{\alpha, \lambda} \quad & g(\alpha, \lambda)\\
\textrm{s.t.} \quad & \alpha \geq 0
\end{aligned}\quad(\text{P3})$$

  1. Does the convexity (convex or concave) of $L$ depends on the convexity of $f$, $l_i$ and $h_i$? Should we study it w.r.t $x$ and $\xi=(\alpha, \lambda)$?
  2. In this post it is mentioned that $L(x, \alpha, \lambda)$ is concave. What is the meaning of minimizing a concave function?
  3. If we want to maximize (P1) what will be the (P2) and (P3)? (I suppose (P1) has to be a concave problem so as to be maximized. Please correct me.)
  4. Is the order of minimizing and maximizing (P2) and (P3), respectively, change according to the convexity of the entire problem (P1)?
  5. For example, what will be the order of optimization in (P2) and (P3) in the scenario below?

Scenario: Given that $f$ is a convex function, and
constraints $l_i$ and $h_i$ are such that (P1) is not convex. For
example, $l_i$ a linear constraint and $h_i$ a non-linear one (e.g., unit sphere normalization).

Thank you!

Best Answer

First, setting the stage: typically $f$ is assumed to be convex, $l_i$ is assumed to be convex, and $h_j$ is assumed to be affine. If you do not assume this, then (P2) will be hard to solve but will still exist nevertheless. Now to the specific points,

  1. Yes, it does. $L(x, \alpha, \lambda)$ is convex in $x$ and concave in $(\alpha, \lambda)$. That $L(\cdot, \alpha, \lambda)$ is convex follows directly by the convexity of $f$, $h_i$, and $l_i$ for all $i$. Let's make this clear: fix $a \in \mathbb{R}^m$ and $b \in \mathbb{R}^p$ and define $r(x) = L(x, a, b)$, let $x \in \mathbb{R}^d$ and $t \in [0,1]$, then \begin{align} r(tx + (1-t)y) &= f(tx + (1-t)y) + \sum_{i=1}^{m} \alpha_i l_i (tx + (1-t)y) + \sum_{j=1}^{p} \lambda_j h_j (tx + (1-t)y) \end{align} By the convexity of $f$, we have $$f(tx + (1-t)y) \leq t f(x) + (1-t) f(y)$$ Similarly, by the convexity of $l_i$ we have $$ l_i (tx + (1-t)y) \leq t l_i (x) + (1-t) l_i (y)$$ and since $\alpha_i$ is assumed positive multiplying by it does not change this inequality. Finally, since $h_j$ is affine, then we have equality $$\lambda_j h_j (tx + (1-t)y) = t \lambda_j h_j (x) + (1-t) \lambda_j h_j (y)$$ Combining the last three equations we can easily see that $$ r(tx + (1-t)y) \leq t r(x) + (1-t) r(y). $$ Hence, $L(\cdot, a, b)$ is convex. Similarly, if you fix $x$, then $L(x, \xi)$ is affine in $\xi$, and is therefore both convex and concave (as all affine functions are).

  2. The definition of the dual function $g(\alpha, \lambda)$ minimizes $L(x, \alpha, \lambda)$ over $x$, and this is convex minimization since as shown, $L$ is convex in $x$. Therefore, this minimization is well-defined.

  3. Optimizing (P1) is a different problem from the problems (P2) and (P3). These problems are generally not equivalent, but under certain conditions (such as Slater's condition) they are. In which case you can either try to solve the primal problem directly (optimize (P1) by using some form of projected gradient descent, for example) or solve the dual problem by trying to solve (P3) and in the process solve (P2), possibly multiple times. The dual optimization problem is this $$ \max_{\alpha, \lambda} \min_{x} L(x, \alpha, \lambda). $$ If you assume access to $g(\alpha, \lambda) = \min_{x} L(x, \alpha, \lambda)$ then this is a concave maximization problem, since $g(\alpha, \lambda)$ is concave as it is a minimum of affine functions regardless of the convexity of $L(\cdot, \alpha, \lambda)$. Of course, if $L(\cdot, \alpha, \lambda)$ is not convex then it will be hard to obtain the value of $g(\alpha, \beta)$ for any $\alpha, \beta$, and the dual problem may be hard.

  4. No.

  5. I assume you meant $h_i$ linear and $l_j$ nonlinear but convex, because $h_i$ must be linear in this formulation. In which case, you will try to solve (P3) which will require solving (P2) possibly many times for different values of $(\alpha, \beta)$. Since $f$ is non-convex, Solving (P2) cannot generally be done and you must exploit the problem structure in some way. If you can find a closed form solution for P2 in terms of $(\alpha, \beta)$ then you would use it to solve P3.

I hope this answers your questions.

Related Question