[Math] the intuition behind Slater’s condition in optimization? (And other constraint qualifications.)

convex optimizationconvex-analysisintuitionoptimization

I would like to "grok" Slater's condition and other constraint qualification conditions in optimization.

Slater's condition is only one of many different constraint qualifications in the optimization literature. Which one is the most fundamental? Which one tells me "what's really going on"? What is the basic idea at the heart of this?

Also, constraint qualifications appear in both convex and non-convex optimization. Is there a unifying viewpoint that shows it is the same simple, basic idea in all cases?

I'd be interested in any insights or viewpoints that lead to a deeper understanding of constraint qualifications in optimization.


Edit: Here is one possible viewpoint. Buried on p. 223 (chapter 23) of Rockafellar's Convex Analysis, we find the following fundamental and vital fact.

Let $f_1,\ldots,f_m$ be proper convex functions on $\mathbb R^n$, and let $f = f_1 + \cdots + f_m$. If the convex sets $\text{ri}(\text{dom }
f_i), i = 1,\ldots m$, have a point in common, then $$ \partial f(x) = \partial f_1(x) + \cdots + \partial f_m(x). $$ This condition for equality can be weakened slightly if certain of the functions, say $f_1,\ldots, f_k$, are polyhedral: then it is enough if the sets $\text{dom } f_i, i = 1,\ldots,k$ and $\text{ri}(\text{dom } f_i), i = k+1,\ldots,m$ have a point in common.

This subdifferential sum rule can be used to derive optimality conditions for various convex optimization problems, including the KKT conditions for convex problems. For example, the optimization problem

\begin{align}
\text{minimize} & \quad f(x) \\
\text{subject to } & \quad x \in C
\end{align}
where $f$ is a closed convex functin and $C$ is a closed convex set, is equivalent to the problem
$$
\text{minimize} \quad f(x) + I_C(x)
$$
where $I_C$ is the indicator function for $C$.
The optimality condition for this problem is
$$
0 \in \partial (f + I_C)(x) = \partial f(x) + \partial I_C(x),
$$
but for the equality to be valid the "overlapping relative interior" condition must be satisfied. So, we need the relative interior of $C$ to have a point in common with the relative interior of $\text{dom } f$. This is a "constraint qualification" for the problem of minimizing $f$ subject to the constraint that $x \in C$.

So is this "overlapping relative interiors" condition appearing in the subdifferential sum rule the ultimate, most fundamental constraint qualification?

Can Slater's condition be viewed as a special case of this "overlapping relative interior" condition?

The "overlapping relative interior" condition apparently has nothing to do with non-convex optimization problems. Is there a unifying viewpoint that applies to both convex and non-convex problems?

Best Answer

Slater's condition relates to existence of Lagrange multipliers in a convex program. Lagrange multipliers exist when there is a nonvertical supporting hyperplane that passes through a point on the boundary of a related convex set. Intuitively, nonvertical hyperplanes can only fail to exist when constraints of the convex program can only be achieved at "edge" type points of the set...Slater ensures constraints can be achieved at "non-edge" type points.

Discussions with pictures can be found in various places, such as Convex Analysis and Optimization by Bertsekas, Nedic, and Ozdaglar. The Slater result is very simple to derive and below I summarize a math derivation that may also give intuition.


The convex program

Mathematically we have a general convex program:

\begin{align} \mbox{Minimize:} \quad & f(x) \\ \mbox{Subject to:} \quad & g_i(x) \leq 0 \quad , \forall i \in \{1, \ldots, k\} \\ & x \in \mathcal{X} \end{align} where $\mathcal{X} \subseteq \mathbb{R}^n$ is a convex set and $f, g_1, ..., g_k$ are convex functions defined over $\mathcal{X}$. Let us suppose there is (at least one) $x^* \in \mathcal{X}$ that is an optimal solution to the above convex program.

Lagrange multipliers

Define the set $\mathcal{A}$: $$ \mathcal{A} = \{(y_0, y_1, ..., y_k) : y_0 \geq f(x), y_1 \geq g_1(x), ..., y_k \geq g_k(x) \mbox{ for some $x \in \mathcal{X}$} \}$$

It can be shown that $\mathcal{A}$ is a convex set and the point $(f(x^*), 0, 0, ..., 0)$ is a boundary point of the set. By the hyperplane separation theorem it follows there are real numbers $\theta_0, \theta_1, ..., \theta_k$, not all zero, such that $$ \theta_0y_0 + \sum_{i=1}^k \theta_i y_i \geq \theta_0f(x^*) \quad \forall (y_0, ..., y_k) \in \mathcal{A} $$ By nature of the set $\mathcal{A}$ (namely, any point that dominates a point in $\mathcal{A}$ is also in $\mathcal{A}$), it is easy to show the above can only hold if $\theta_i \geq 0$ for all $i \in \{0, 1, ..., k\}$. Since for any $x \in \mathcal{X}$ we have $(f(x), g_1(x), ..., g_k(x)) \in \mathcal{A}$, a special case of the above inequality is: $$ \theta_0f(x) + \sum_{i=1}^k \theta_i g_i(x) \geq \theta_0f(x^*) \quad \forall x \in \mathcal{X} \quad (Eq. 1)$$

We say the hyperplane is nonvertical if $\theta_0 \neq 0$, in which case we can define $\mu_i = \theta_i/\theta_0 \geq 0$ for all $i \in \{1, ..., k\}$ and we call these Lagrange multipliers because they are nonnegative and: $$ \theta_0 \neq 0 \implies f(x) + \sum_{i=1}^k \mu_i g_i(x) \geq f(x^*) \quad, \forall x \in \mathcal{X}$$ Assuming $\theta_0 \neq 0$, the above inequality implies that $x^*$ is a minimizer of $f(x) + \sum_{i=1}^k \mu_ig_i(x)$ over all $x \in \mathcal{X}$, the minimum value is $f(x^*)$, and $\mu_i g_i(x^*)=0$ for all $i \in \{1, ..., k\}$.

Slater's condition

Slater's condition: Suppose there is an $s \in \mathcal{X}$ such that $g_i(s) < 0$ for all $i \in \{1, ..., k\}$. (So all constraints can be achieved with positive slackness.)

Claim: If Slater's condition holds, then $\theta_0 \neq 0$ and hence Lagrange multipliers exist.

Proof: Assume $\theta_0=0$ and plug $s \in \mathcal{X}$ into Eq. 1 to get a contradiction (recalling that not all $\theta_i$ are zero). $\Box$

I do not fill in the details here since I sometimes give these details as a homework exercise (Exercise VIII-B.8 on page 40 here):

http://ee.usc.edu/stochastic-nets/docs/network-optimization-notes.pdf

Fig. 19 on page 55 of those notes gives a simple counter-example where a nonvertical hyperplane does not exist, and this is also useful to get intuition on Slater's condition. These examples are standard and similar ones can be found in the Bertsekas book.

Further intuition

Going back to the set $\mathcal{A}$ and the boundary point $(f(x^*), 0, 0, ..., 0)$, intuitively we see that if Slater's condition holds then any vertical hyperplane that passes through this boundary point would chop $\mathcal{A}$ into two pieces (with points of $\mathcal{A}$ lying strictly on one side of the vertical hyperplane, and other points lying strictly on the other side). Hence, if Slater's condition is true, then a vertical hyperplane would fail to be a supporting hyperplane. [Again refer to Fig. 19 in the above notes to visualize.]

Related Question