[Math] Why is the Fenchel conjugate of a piecewise linear function piecewise linear

convex optimizationconvex-analysis

Consider the function:

$$ f: x \mapsto
\begin{cases}
a_1 x + b_1 & x \le \frac{b_1 – b_2}{a_2 – a_1} \\
a_i x + b_i & \frac{b_{i-1} – b_i}{a_i – a_{i-1}} \le x \le \frac{b_i – b_{i+1}}{a_{i+1} – a_i} \quad \forall 1 \le i \le n-1 \\
a_nx + b_n & x \ge \frac{b_{n-1} – b_n}{a_n – a_{n-1}}
\end{cases} \, $$

where we can assume that $a_1 < \dots < a_n$. This is question 3.36 in Boyd and Vandenberghe. The solutions manual (p. 55 of the PDF) claims that:

For $a_i ≤ z ≤ a_{i+1}$, the supremum in the
definition of $f^∗$ is reached at the breakpoint between the segments $i$ and $i + 1$, i.e.,
at the point $(b_{i+1} − b_i)/(a_{i+1} − a_i)$, so we obtain
$$f^∗(z) = −b_i − (b_{i+1} − b_i)\frac{y − a_i}{ a_{i+1} − a_i}$$
where $i$ is defined by $a_i ≤ z ≤ a_{i+1}$.

Question: Why is this claim true? How can we prove that claim without knowing anything about the $b$'s or the relative sizes of the $(a_i – a_{i-1})$'s?

What I have figured out so far about $f^*(z)$ is included below, but it doesn't seem strong enough to allow me to reach the same conclusion.

Note: This is a follow-up to a previous question.

Work so far: One can write (note: $c_{i,i+1}$ is meant to denote the breakpoint between the $i$th and $(i+1)$st segments, specifically $(b_i – b_{i+1})/(a_{i+1} – a_i)$):

$$\begin{array}{rcl}
f^*(z) & = & \sup_{x \in \mathbb{R}}[ xz – f(x) ] \\
& = & \max_{i \in [n]} \left\{
\begin{array}{l}
\displaystyle \sup_{x \le c_{12}} [ xz – a_1 x – b_1] \,, \\
\displaystyle \sup_{c_{12} \le x \le c_{23}} [xz – a_2 x – b_2] \,, \\
\vdots \\
\displaystyle \sup_{x \ge c_{n-1, n}} [xz – a_n x – b_n]
\end{array}
\right\} \\
& = & \max_{i \in [n]} \left\{
\begin{array}{l}
\displaystyle \sup_{x \le c_{12}} [ (z – a_1)x – b_1] \,, \\
\displaystyle \sup_{c_{12} \le x \le c_{23}} [(z – a_2) x – b_2] \,, \\
\vdots \\
\displaystyle \sup_{x \ge c_{n-1, n}} [(z – a_n) x – b_n]
\end{array}
\right\} \,. \\
\end{array}$$

We note that the supremum terms over $[c_{i-1, i} , c_{i, i+1}]$ will always be finite (i.e. for $1 < i < n$), since we are always taking the supremum of a continuous function over a compact set (closed interval), making the extreme value theorem applicable. Thus, when checking to see when (if ever) $f^*(z)$ is unbounded, it suffices to only consider the terms corresponding to suprema over $(-\infty, c_{12}]$ and $[c_{n-1, n}, \infty)$.

Because a linear function is unbounded \textit{from above} (we don't need to consider whether it's unbounded from below since we're taking a supremum) as $x \to -\infty$ if and only if its slope is negative, we have that:

$$ \left[ \sup_{x \le c_{12}} [ (z-a_1) x – b_1 ] = +\infty \quad \iff \quad z – a_1 < 0 \right] \quad \quad \implies \quad \quad f^*(z) = +\infty \text{ whenever }z < a_1 \,, $$

since the maximum over a finite set will always be $+\infty$ if one of the elements of that set is $+\infty$.

Similarly, since a linear function is unbounded from above as $x \to \infty$ if and only if its slope is positive:

$$ \left[ \sup_{x \ge c_{n-1,n} } [ (z-a_n) x – b_n ] = +\infty \quad \iff \quad z – a_n > 0 \right] \quad \quad \implies \quad \quad f^*(z) = +\infty \text{ whenever }z > a_n \,. $$

Therefore we only need to do any work calculating $f^*(z)$ for $z$ in the closed interval $[a_1, a_n]$.

When $z \in [a_1, a_n]$, $z – a_1 \ge 0$, so the slope of $(z-a_1)x – b_1$ is non-negative, and thus the supremum of the first term occurs at the right endpoint ($c_{12}$). Similarly, $z-a_n \le 0$, so the slope of $(z – a_n)$ is non-positive, and the supremum of the last term occurs at the left endpoint ($c_{n-1, n}$).

For the terms $1 < i < n$, consider that the slope is non-positive, and thus the supremum occurs at the left endpoint ($c_{i-1, i}$) of the interval in question, only when $z \le a_i$, whereas the slope is non-negative, and the supremum occurs at the right endpoint ($c_{i,i+1}$) of the interval only when $z \ge a_i$. Therefore, when restricted to $[a_1, a_n]$, we have for $f^*(z)$ that:

$$\begin{array}{rcl}
f^*(z) & = & \max_{i \in [n]} \left\{
\begin{array}{l}
(z – a_1)c_{12} – b_1 \,, \\
\vdots\\
\begin{cases}
(z-a_i)c_{i-1,i} – b_i & z \in [a_1, a_i] \\
(z-a_i)c_{i,i+1} – b_i & z \in [a_i, a_n]
\end{cases} \\
\vdots \\
(z – a_n)c_{n-1, n} – b_n
\end{array}
\right\} \,. \\
\end{array}$$

Specifically, for a $z$ within a given $[a_i, a_{i+1}]$, we have:

$$\begin{array}{rcl}
f^*(z) & = & \max_{i \in [n]} \left\{
\begin{array}{l}
(z – a_1)c_{12} – b_1 \,, \\
\vdots\\
(z – a_{i-1})c_{i-1, i} – b_{i-1}\\
(z-a_i)c_{i,i+1} – b_i \\
(z – a_{i+1}) c_{i, i+1} – b_{i+1} \\
\vdots \\
(z – a_n)c_{n-1, n} – b_n
\end{array}
\right\} \\
& = & \max_{i \in [n]} \left\{
\begin{array}{l}
(z – a_1) \frac{b_1 – b_2}{a_2 – a_1} – b_1 \,, \\
\vdots\\
(z – a_{i-1})\frac{b_{i-1} – b_i}{a_i – a_{i-1}} – b_{i-1}\\
(z-a_i)\frac{b_i – b_{i+1}}{a_{i+1} – a_i} – b_i \\
(z – a_{i+1}) \frac{b_i – b_{i+1}}{a_{i+1} – a_i} – b_{i+1} \\
\vdots \\
(z – a_n)\frac{b_{n-1} – b_n}{a_n – a_{n-1}} – b_n
\end{array}
\right\} \,. \\
\end{array}$$

Note that for $j < i$, $z – a_j > z – a_i > 0$ due to the fact that $a_1 < \dots < a_n$. Similarly, for $j > i$, $z – a_j < 0 < z – a_i$. With the information above, it's very unclear to me how one could evaluate this expression further to get the result of Boyd and Vandenberghe.

Best Answer

Consider any $a_i, a_{i+1}$ and $z \in [a_i, a_{i+1}]$. No doubt, that

$$ f(x) = \begin{cases} a_ix+b_i, & x \in [\frac{b_{i-1}-b_i}{a_i - a_{i-1}};\ \frac{b_{i}-b_{i+1}}{a_{i+1} - a_{i}}]=c_i \\ a_{i+1}x+b_{i+1}, & x \in [\frac{b_{i}-b_{i+1}}{a_{i+1} - a_{i}};\ \frac{b_{i+1}-b_{i+2}}{a_{i+2} - a_{i+1}}]= c_{i+1} \end{cases} $$

We are interested in calculating

$$ f^*(z)=\sup_{x}\ (xz-\max_j(a_jx+b_j)) $$

Step 1. Let's prove, that

$$ \sup_{c_i \cup c_{i+1}}\ (xz-\max_{j = i, i+1 }(a_jx+b_j)) = x^*z-a_ix^*-b_i = x^*z-a_{i+1}x^*-b_{i+1} $$

where $x^* = \frac{b_{i}-b_{i+1}}{a_{i+1} - a_{i}}$. Since $c_i \cup c_{i+1}$ is a compact, we can write $max$ instead of $sup$. Since $z - a_i >0$:

$$ \text{argmax}_{c_i}\ (xz-a_ix-b_i) = \text{argmax}_{c_i}\ (x(z-a_i)-b_i) =x^* $$

And because $z - a_{i+1} < 0$:

$$ \text{argmax}_{c_{i+1}}\ (xz-a_{i+1}x-b_{i+1}) = \text{argmax}_{c_{i+1}}\ (x(z-a_{i+1})-b_{i+1}) =x^* $$

It's true, that $a_ix^*+b_i = a_{i+1}x^*+b_{i+1}$. Q.E.D.

Step 2.

It remains only to prove, that the following is true:

$$ \sup_{R\setminus \{c_i, c_{i+1}\}}\ (xz-f(x)) < \sup_{c_i \cup c_{j+1}}\ (xz-\max_{j = i, i+1}(a_jx+b_j)) = $$

Define $c_1 = [- \infty; \frac{b_1-b_2}{a_2-a_1}]$, $c_2 = [\frac{b_1-b_2}{a_2-a_1}; \frac{b_2-b_3}{a_3-a_2}]$. Since $z-a_1 > 0, z-a_2 > 0$:

$$ \sup_{c_1}\ (xz-a_1x-b_1) = \sup_{c_1}\ (x(z-a_1)-b_1) = \frac{b_1-b_2}{a_2-a_1}(z-a_1)-b_1 = \frac{b_1-b_2}{a_2-a_1}(z-a_2)-b_2 $$

$$ \sup_{c_2}\ (xz-a_2x-b_2) = \sup_{c_2}\ (x(z-a_2)-b_2) = \frac{b_2-b_3}{a_3-a_2}(z-a_2)-b_2 $$

No doubt, that

$$ \sup_{c_1} < \sup_{c_2} $$

By induction you can verify, that if $z \in [a_i; a_{i+1}]$

$$ \sup_{c_1} < \sup_{c_2} < ... < \sup_{c_i} $$

That's why we have the following:

$$ \sup_{c_1 \cup c_2 \cup ...\cup c_{i-1}} < \sup_{c_i \cup c_{i+1}} $$

Similarly, one can check, that:

$$ \sup_{c_{i+2} \cup c_{i+3} \cup ...\cup c_n} > \sup_{c_i \cup c_{i+1}} $$

Q.E.D.

All in all, we show, that

$$ \sup_{x}\ (xz-\max_j(a_jx+b_j)) = x^*z-a_ix^*-b_i $$

I don't think that is very strict and elegant proof, but I hope, that it is at least correct.

Best Answer

Related Solutions

[Math] How to rigorously show that maximum of linear functions is piecewise linear

Fenchel Conjugate of the Hinge Loss

Related Question