[Math] Is There an Induction-Free Proof of the ‘Be The Leader’ Lemma

ca.classical-analysis-and-odesconvex optimizationinductionmachine learning

This lemma is used in the context of online convex optimisation. It is sometimes called the 'Be the Leader Lemma'.

Lemma:
Suppose $f_1,f_2,\ldots,f_N$ are real valued functions with the same domain. Define each $F_n = f_1 + \ldots + f_n$ and let $y_n$ be a minimiser of $F_n$. Then we have $\sum_{i=1}^N f_i(y_i) \le \sum_{i=1}^N f_i(x)$ for all $x$ in the domain.

At first glance I couldn't see any reason this should be true. Obviously if we have more control over the various inputs we can get the sum smaller. For example if each $y_n$ was a minimiser of $f_n$ the result would be obvious. But that's not how the $y_n$ are defined. In fact each $y_n$ seems to only depend very weakly on the function we're plugging it into, and weaker still as we make $n$ larger.

In the online optimisation context we receive enemy convex functions $f_1,f_2,\ldots$ with the same (compact convex) domain and we must select inputs $x_1,x_2,\ldots$ for $f_1,f_2,\ldots$ respectively. The catch is we must select each $x_{n+1}$ knowing only $x_1,x_2,\ldots, x_n$ and $f_1,f_2,\ldots, f_n$ and without knowing the function $f_{n+1}$ we're going to plug $x_{n+1}$ into.
Our goal is to make the limit $\frac{1}{n}\sum_{i=1}^n \big (f_i(x_i) – f_i(x) \big )$ tend to zero for each $x$ in the domain as $n \to \infty$.

Since each $y_{n+1}$ in the lemma requires us to look ahead in time to see $f_{n+1}$, we cannot use the lemma directly to get a valid strategy. But there are some strategies which you can prove are valid by comparing them to the 'Be the Leader strategy'.

The proof of the lemma is surprisingly short. It uses induction and I find it completely unenlightening.

Suppose the lemma holds for some value of $N$. That is to say

$\sum_{i=1}^N f_i(y_i) \le \sum_{i=1}^N f_i(x)$ for all $x$ in the domain.

Next we add $f_{N+1}(x)$ to both sides:

$\sum_{i=1}^N f_i(y_i) + f_{N+1}(x) \le \sum_{i=1}^{N+1} f_i(x)$ for all $x$ in the domain.

In particular we have

$\sum_{i=1}^N f_i(y_i) + f_{N+1}(y_{N+1}) \le \sum_{i=1}^{N+1} f_i(y_{N+1})$.

But since $y_{N+1}$ minimises the right-hand-side we can replace it with any $x$ in the domain to get.

$\sum_{i=1}^{N+1} f_i(y_i) \le \sum_{i=1}^{N+1} f_i(x)$ for all $x$ in the domain.

This doesn't clear up the problems with the intuition about how each $y_n$ depends only very weakly on $f_n$. Can anyone see a way to prove the lemma without using induction? Perhaps by bundling together the $n$ many leapfrogging optimisations into a single optimisation of a function with some sort of complex domain?

The Be The Leader Lemma makes no assumptions about the domain or about the functions. In particular it does not require $f_i$ be continuous. If it allows for a more transparent proof, feel free to assume for example the functions $f$ are continuous and the domain is some compact convex subset of some $\mathbb R^M$.

Best Answer

Is the following any clearer? (Read the subscripts carefully!)

$$\begin{array}{ll} f_1(y_1) + f_2(y_2) + f_3(y_3) + \cdots + f_n(y_n) &\leq \\ f_1(y_2) + f_2(y_2) + f_3(y_3) + \cdots + f_n(y_n) & \leq \\ f_1(y_3) + f_2(y_3) + f_3(y_3) + \cdots + f_n(y_n) & \leq \\ \qquad \qquad \qquad \vdots & \leq \\ f_1(y_n) + f_2(y_n) + f_3(y_n) + \cdots + f_n(y_n) & \leq \\ f_1(x) + f_2(x) + f_3(x) + \cdots + f_n(x). &\\ \end{array}$$

I don't think this is different from the induction proof in any serious way, but sometimes "$\cdots$" is more intuitive than a formal induction.


Here is a remark that might be useful: There are only $n(n+1)$ relevant variables: $f_i(y_j)$ and $f_i(x)$. The hypotheses are linear inequalities between them, and so is the conclusion.

One way to state Farkas' Lemma (aka Linear Programming Duality) is that, if we have a collection of inequalities $\sum_i A_{ij} x_i \geq 0$ in variables $x_i$, and they imply that $\sum_i b_i x_i \geq 0$, then there must be some nonnegative $c_j$ such that $\sum b_i x_i = \sum_j c_j \left( \sum_i A_{ij} x_i \right)$. In words, if some collection of linear inequalities implies another linear inequality, then there must be a linear proof.

So there had to be a way to write the conclusion as a linear consequence of the hypotheses, and I just had to find it.

Related Question