Is there a chain rule for Sibson’s mutual information

Mutual Information satisfies the chain rule: $$I(X,Y;Z) = I(X;Z) + I(Y;Z|X).$$
The chain rule is useful and the proof is simply linearity of expectations.

Sometimes we want something stronger than mutual information. Mutual information is the expectation of the pointwise mutual information, i.e., $I(X;Y) = \mathbb{E}\left[\log\left(\frac{P_{XY}(X,Y)}{P_X(X)P_Y(Y)}\right)\right]$. We might want high probability bounds of the form $\mathbb{P}\left[\log\left(\frac{P_{XY}(X,Y)}{P_X(X)P_Y(Y)}\right)\le\varepsilon\right]\ge1-\delta$. This leads us to notions of mutual information based on Rényi divergences. In particular, we consider Sibson's mutual information definition:
$$I_\alpha(X;Y) = \min_{Q_Y} D_\alpha\left(P_{XY}\middle\| P_X \times Q_Y\right),$$ where $D_\alpha(P\|Q) = \frac{1}{\alpha-1} \log \sum_x P(x)^\alpha Q(x)^{1-\alpha}$ is the Rényi divergence of order $\alpha$.
This definition coincides with the usual definition of mutual information as $\alpha\to1$.

Sibson's mutual information $I_\alpha(X;Y)$ is the Rényi divergence from the joint distribution $(X,Y)$ and a product distribution that matches the marginal distribution of $X$. But the marginal distribution of $Y$ is not necessarily matched – thus Sibson's mutual information is not symmetric. Rather than matching $Y$, we choose this marginal to minimize the divergence.

We can analytically minimize the Rényi divergence to obtain a closed-form expression for Sibson's mutual information:
$$I_\alpha(X;Y) = \frac{\alpha}{\alpha-1}\log\left( \sum_y \left( \sum_x P_X(x) \cdot P_{Y|X}(y|x)^\alpha \right)^{1/\alpha} \right)$$
$$~~~~~~~~~~~~ = \frac{\alpha}{\alpha-1} \log \mathbb{E}_Y \left[\exp\left(\frac{\alpha-1}{\alpha}D_\alpha(P_{X|Y}\|P_X)\right)\right].$$

My question: Does Sibson's mutual information satisfy some (weak) form of the chain rule? Specifically, I have the following conjecture.

Conjecture: Let $X$ and $Y$ be independent and let $Z$ depend on $X$ and $Y$ arbitrarily. Let $\alpha>1$. Then $$I_\alpha(X,Y;Z) \le \max_y I_\alpha(X;Z|Y=y) + \max_x I_\alpha(Y;Z|X=x).$$

This conjecture is weaker than the chain rule for the usual mutual information in three ways: First, we take the max over $x$ in the second term, rather than the expectation. Second, we take the max over $y$ in the first term, rather than marginalizing it. Third, we assume $X$ and $Y$ are independent.
I suspect that we can actually prove something stronger than this conjecture (i.e., without all three weakenings).

If we expand out the definition of Sibson's mutual information, then the conjecture above is equivalent to
$$\sum_z \left( \sum_x \sum_y P_X(x) \cdot P_Y(y) \cdot P_{Z|XY}(z|x,y)^\alpha \right)^{1/\alpha}$$ $$\le \left( \max_{y_*} \sum_z \left( \sum_x P_X(x) \cdot P_{Z|XY}(z|x,y_*)^\alpha \right)^{1/\alpha} \right) \cdot \left( \max_{x_*} \sum_z \left( \sum_y P_Y(y) \cdot P_{Z|XY}(z|x_*,y)^\alpha \right)^{1/\alpha}\right)$$

Best Answer

The conjecture is not true in general. A counterexample can be constructed as follows.

Suppose $X, Y, Z$ take values from $\{1, 2, 3, 4\}$, with $$ \begin{align*} P_{Z|X, Y}(z|x, y) = \begin{cases} 1& \text{if $x = y = z$},\\ 0& \text{if $x = y$, $z \neq x$},\\ 1/4& \text{if $x \neq y$}. \end{cases} \end{align*} $$ Suppose $P_X$ and $P_Y$ are arbitrary distributions with positive probability masses. Let $\alpha = \infty$, then

\begin{align*}\text{LHS} &= \sum_z \left( \sum_x \sum_y P_X(x) \cdot P_Y(y) \cdot P_{Z|XY}(z|x,y)^\alpha \right)^{1/\alpha}\\ &= \sum_z \left(\max_{x, y}P_{Z|XY}(z|x,y)\right)\\ &= 4. \end{align*}

For all $y_* = 1, 2, 3, 4$, we have $$ \sum_z \left( \sum_x P_X(x) \cdot P_{Z|XY}(z|x,y_*)^\alpha \right)^{1/\alpha} = \sum_z \left(\max_x P_{Z|XY}(z|x,y_*)\right) = 1 + 3 \cdot \frac14 < 2. $$ By symmetry, for all $x_* = 1, 2, 3, 4$, we have $$\sum_z \left( \sum_y P_Y(y) \cdot P_{Z|XY}(z|x_*,y)^\alpha \right)^{1/\alpha} < 2. $$ Therefore, $\text{LHS} = 4 > \text{RHS}$.

Remarks:

This construction can also be used as a counterexample for general $X, Y$ dependence, i.e., when we replace $P_{X}P_{Y}$ in LHS with $P_{X, Y}$, and replace the $P_X, P_Y$ in RHS with $P_{X|Y=y_*}$ and $P_{Y|X=x_*}$, respectively.
Neither the zero-probability mass in $P_{Z|X, Y}$ nor the assumption $\alpha = \infty$ is necessary for constructing a counter example. In general, one can consider $X, Y, Z$ taking values from $\{1, 2, \cdots, K\}$ with \begin{align*} P_{Z|X, Y}(z|x, y) = \begin{cases} p& \text{if $x = y = z$},\\ (1-p)/(K-1)& \text{if $x = y$, $z \neq x$},\\ 1/K& \text{if $x \neq y$}. \end{cases} \end{align*}

Then, we can again obtain a counterexample for $pK \geq 4$ with $\alpha$ sufficiently large.

Best Answer

Related Solutions

[Math] Alternative proof that conditioning reduces entropy

Related Question