Optimization – Use of Min-Max Equality (Strong Duality) in Proof

I'm attempting to follow a proof in a paper. In the proof, the authors make use of what I believe is the min-max inequality and exploit strong duality, but I'm unsure of the steps that are glossed over. Specifically, how do the authors get from the second to third equality in the equations repeated below?

I am aware of the min-max inequality in the context of the Lagrangian, but I don't know how it used below. Classically for the Lagrangian, we are minimizing over the decision variables, $\mathbf{x}$, and maximizing over $\mathbf{\lambda}$, but in the equation below $\mathbf{z}$ are the decision variables, and we seem to be maximizing over them, so I'm a bit confused.

For clarity, Equation 3.15 that is referenced in the equations below is simply the definition of a convex conjugate, and $h(\mathbf{z})$ is assumed to be a convex function. Additionally, $\lVert \, \cdot \, \rVert$ is some norm, $\lVert \, \cdot \, \rVert_\ast$ is the corresponding dual norm, and $^\prime$ denotes transpose.

Thank you for any and all help.

Reference to paper (pg. 60):

Ruidi Chen and Ioannis Ch. Paschalidis (2020), “Distributionally
Robust Learning”, : Vol. 4, No. 1–2, pp 1–243. DOI: 10.1561/2400000026

$$
\sup_{\mathbb{Q}\in \Omega} \mathbb{E}_{\mathbb{Q}}\left[ h(\mathbf{z})\right] = \min_{\lambda \geq 0} \left\{ \lambda \epsilon + \frac{1}{N} \sum_{i=1}^N \sup_{\mathbf{z} \in \mathcal{Z}} \left[h(\mathbf{z}) -\lambda\lVert\mathbf{z}-\mathbf{z}_i \rVert\right] \right\} \tag{3.17}
$$

Using (3.15), we may write the inner maximization in (3.17), as:
$$
\begin{aligned}\sup_{\mathbf{z} \in \mathcal{Z}} \left[h(\mathbf{z}) -\lambda\lVert\mathbf{z}-\mathbf{z}_i \rVert\right] &= \sup_{\mathbf{z} \in \mathcal{Z}}\sup_{\mathbf{\theta}\in\mathbf{\Theta}} \left[ \mathbf{\theta}^\prime \mathbf{z} – h^\ast(\mathbf{\theta}) – \lambda\lVert \mathbf{z} – \mathbf{z}_i\rVert \right]\\
&= \sup_{\mathbf{z} \in \mathcal{Z}}\sup_{\mathbf{\theta}\in\mathbf{\Theta}} \inf_{\lVert\mathbf{r} \rVert_\ast \leq \lambda} \left[ \mathbf{\theta}^\prime \mathbf{z} – h^\ast(\mathbf{\theta}) + \mathbf{r}^\prime(\mathbf{z}-\mathbf{z}_i) \right]\\ &= \sup_{\mathbf{\theta}\in\mathbf{\Theta}} \inf_{\lVert\mathbf{r} \rVert_\ast \leq \lambda}\sup_{\mathbf{z} \in \mathcal{Z}} \left[ (\mathbf{\theta} + \mathbf{r})^\prime \mathbf{z} – h^\ast(\mathbf{\theta}) – \mathbf{r}^\prime\mathbf{z}_i \right] \\ &\leq \sup_{\mathbf{\theta}\in\mathbf{\Theta}} \inf_{\lVert\mathbf{r} \rVert_\ast \leq \lambda}\sup_{\mathbf{z} \in \mathbb{R}^d} \left[ (\mathbf{\theta} + \mathbf{r})^\prime \mathbf{z} – h^\ast(\mathbf{\theta}) – \mathbf{r}^\prime\mathbf{z}_i \right] \end{aligned} $$
where the second equality follows from the definition of the dual norm and the third equality uses duality. The inner maximization over $\mathbf{z}\in\mathbb{R}^d$ achieves $\infty$ unless $\mathbf{r} = -\mathbf{\theta}$.

Best Answer

The min-max theorem has a general form which is independent of the concept of the Lagrangian. You can use it for any function, and you get the inequality. If you want equality (what you do not need now actually), you can also consider a minimax theorem without mentioning the Lagrangian. As minimax theorems are derived from duality theory, the authors of your book might have referred to this in their comment in the proof.

I recommend the book of Convex Analysis by Tyrrell Rockafellar to learn about duality theory and how it is used to derive minimax theorems, there is really a lot in this book on the topic. For the actual case, we can use the following minimax theorem (Corollary 37.3.2):

Let $C$ and $D$ be non-empty closed convex sets in $R^m$ and $R^n$, respectively, and let $K$ be a continuous finite concave-convex function on $C \times D$. If either $C$ or $D$ is bounded, one has $sup_{u\in C}\inf_{v\in D} K(u,v) = \inf_{v\in D}\sup_{u\in C}K(u,v)$.

So in your case, $\mathcal{Z}$ is a closed convex set (what you forgot to mention), so you can define $K_\mathbf{\theta}(\mathbf{z},\mathbf{r}) = \theta'\mathbf{z} - h^*(\theta) + r'(\mathbf{z}-\mathbf{z}_i)$ on the domain $C \times D$ with $C = \mathcal{Z}$ and $D = \{\mathbf{r} : \|\mathbf{r}\|_* \le \lambda\}$, and apply the above minimax theorem because $K$ is linear in both of its variables (so it is concave-convex), and $D$ is bounded (and closed convex): $$ \sup_{\mathbf{z}\in\mathcal{Z}}\sup_{\theta\in\Theta}\inf_{\|\mathbf{r}\|_*\le \lambda} K_\theta(\mathbf{r},\mathbf{z}) = \sup_{\theta\in\Theta}\sup_{\mathbf{z}\in\mathcal{Z}}\inf_{\|\mathbf{r}\|_*\le \lambda} K_\theta(\mathbf{r},\mathbf{z}) = \sup_{\theta\in\Theta}\inf_{\|\mathbf{r}\|_*\le \lambda}\sup_{\mathbf{z}\in\mathcal{Z}} K_\theta(\mathbf{r},\mathbf{z}) . $$

Best Answer

Related Solutions

[Math] Show that the Huber-loss based optimization is equivalent to $\ell_1$ norm based.

Related Question