Concavity of entropic function (information bottleneck)

Let $X$ and $Y$ be statistically dependent variables and let $T$ be another random variable such that the Markov condition $Y\rightarrow X\rightarrow T$ holds. For some $\beta\in [0,1]$, the information bottleneck Lagrangian (see Defintion 4 of this paper) is defined as

$$\mathcal{L}_{IB}(T ; \beta)=I(T ; Y)-\beta I(X ; T)$$

$I(X:Y)$ for two finite dimensional random variables $X$ and $Y$ with joint distribution $p(X,Y)$ and marginal distributions $p(X)$ and $p(Y)$ is defined as

$$I(X:Y) = \sum_{x,y} p(x, y)\log \frac{p(x, y)}{p(x)p(y)}$$

Hence, we can rewrite $\mathcal{L}_{IB}$ as purely a function of $p(t|x)$ (all others in the expression below are given)

$$\mathcal{L}_{IB}=\sum_{y, t}\left[ \left(\sum_x p(t|x)p(x|y)p(y)\right) \log \frac{\left(\sum_x p(t|x)p(x|y)p(y)\right)}{p(y)\sum_{x} p(t|x)p(x)} \right]\\ – \beta \sum_{x, t} p(t | x) p(x) \log \frac{p(t | x)}{\sum_x p(t|x)p(x)}$$

$\mathcal{L}_{IB}$ is not concave in $p(t|x)$ for all $\beta$. Indeed $\beta = 0$ is a good counterexample. Yet multiple algorithms exist that maximize it. They are guaranteed to converge to the global maximum (although convergence speed is not provably bounded). How do these algorithms guarantee not to get stuck in a local maximum?

NB: I have edited the question since the discussion in the comments was very useful.

Best Answer

It is known that $I(X;T)$ is convex in $p(t|x)$ and $I(Y;T)$ is convex in $p(t|y)$. Given that $p(t|y) = \sum_x p(t|x) p(x|y)$ is a linear function of $p(t|x)$, and since a convex function of a linear function is convex, $I(Y;T)$ is also convex in $p(t|x)$. Combined, that means that $\mathcal{L}_{IB} = I(T;Y) - \beta I(T;X)$ is neither concave nor convex in $p(t|x)$.

In fact, in the general case, there are no known algorithms that are guaranteed to find the global maximum of the IB Lagrangian. The only exception to this is the case when $X$ and $Y$ are jointly Guassian (see Chechik, ‎2005).

Best Answer

Related Solutions

[Math] Upper bound of Mutual Information

Are there temperature, energy in mathematics

Related Question