Proof of first order condition for quasiconvex functions (2)

convex optimizationconvex-analysis

Background

Recall that a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is quasiconvex if $\operatorname{dom}(f)$ is convex and the sublevel sets
$$S_\alpha = \{x \in \operatorname{dom}(f)\,\vert\,f(x) \leq \alpha\}$$
are convex for all $\alpha$.

Equivalently, we can state $f$ is quasiconvex iff $\forall x,\,y \in \operatorname{dom}(f),\,\forall \theta \in [0,\,1]$, we have
$$ f(z) \leq \max\{f(x),\,f(y)\},$$ where $z := \theta x + (1 – \theta) y$.

Question

If $f$ also happens to be differentiable on its domain, the first order condition holds:

$$f\text{ is quasiconvex } \iff (f(x) \leq f(y) \implies \nabla f(x)^T(y – x) \leq 0).$$

There are two parts to this proof: in the "only if" ($\implies$) direction (which can be found here) and in the "if" ($\impliedby$) direction.

The latter direction is the focus of this question; I would like to know how to prove the statement $\forall x,\,y \in \operatorname{dom}(f)$
$$f\text{ is quasiconvex } \impliedby (f(x) \leq f(y) \implies \nabla f(x)^T(y – x) \leq 0).$$

Current atempt

Suppose $\forall x,\,y \in \operatorname{dom}(f)$ we know $f(x) \leq f(y) \implies \nabla f(x)^T(y – x) \leq 0$.

Without loss of generality, assume $x$ and $y$ are such that $f(x) \leq f(y)$. Then, we know

  1. $\nabla f(x)^T(y – x) \leq 0$.
  2. For $f$ to be quasiconvex, we need to have $f(z) \leq f(x)$.

This is not much in the way of an "attempt", but I have a feeling these two facts are related. I am just not sure if and how to use (1.) to show (2.). Does it have something to do with, maybe, the directional derivative of $f$ at $x$ in the direction $y – x$?

Best Answer

Your idea is right, but it is not sufficient to use the inequality in a single point, you have to use it "globally" and integrate. Here is what I mean.

Fix any $x,y \in \mathrm{dom}(f)$, and suppose without loss of generality that $f(x) \leq f(y)$. Let $\gamma : [0,1] \to \mathbb{R}^n$, $\gamma(t) = (1-t)y+tx$, and consider $z_0 = \gamma(t_0)$ the maximum point of $f(\gamma(t))$, i.e., the maximum point of $f$ restricted to the segment from $x$ to $y$. The point is well defined because $f(\gamma(t))$ is continuous on a compact set. Then, by the fundamental theorem of calculus, $$ f(z_0) = f(y) + \int_0^{t_0} \nabla f(\gamma(t)) \cdot (x-y)\ dx = f(y) + \int_0^{t_0} \frac{1}{t_0-t} \nabla f(\gamma(t)) \cdot ((t_0-t)x-(t_0-t)y)\ dx = f(y) + \int_0^{t_0} \frac{1}{t_0-t} \nabla f(\gamma(t)) \cdot (z_0 - \gamma(t))\ dx. $$

Since, for evey $t \in [0,1]$, $f(\gamma(t)) \leq f(\gamma(t_0))$ by definition of $t_0$, your properties implies $\nabla f(\gamma(t)) \cdot (z_0 - \gamma(t)) \leq 0$. Moreover the factor $\frac{1}{t_0-t}$ is positive, hence $$ f(z_0) = f(y) + \int_0^{t_0} \frac{1}{t_0-t} \nabla f(\gamma(t)) \cdot (z_0 - \gamma(t))\ dx \leq f(y), $$ as wanted.