Mutual Information satisfies the chain rule: $$I(X,Y;Z) = I(X;Z) + I(Y;Z|X).$$
The chain rule is useful and the proof is simply linearity of expectations.
Sometimes we want something stronger than mutual information. Mutual information is the expectation of the pointwise mutual information, i.e., $I(X;Y) = \mathbb{E}\left[\log\left(\frac{P_{XY}(X,Y)}{P_X(X)P_Y(Y)}\right)\right]$. We might want high probability bounds of the form $\mathbb{P}\left[\log\left(\frac{P_{XY}(X,Y)}{P_X(X)P_Y(Y)}\right)\le\varepsilon\right]\ge1-\delta$. This leads us to notions of mutual information based on Rényi divergences. In particular, we consider Sibson's mutual information definition:
$$I_\alpha(X;Y) = \min_{Q_Y} D_\alpha\left(P_{XY}\middle\| P_X \times Q_Y\right),$$ where $D_\alpha(P\|Q) = \frac{1}{\alpha-1} \log \sum_x P(x)^\alpha Q(x)^{1-\alpha}$ is the Rényi divergence of order $\alpha$.
This definition coincides with the usual definition of mutual information as $\alpha\to1$.
Sibson's mutual information $I_\alpha(X;Y)$ is the Rényi divergence from the joint distribution $(X,Y)$ and a product distribution that matches the marginal distribution of $X$. But the marginal distribution of $Y$ is not necessarily matched – thus Sibson's mutual information is not symmetric. Rather than matching $Y$, we choose this marginal to minimize the divergence.
We can analytically minimize the Rényi divergence to obtain a closed-form expression for Sibson's mutual information:
$$I_\alpha(X;Y) = \frac{\alpha}{\alpha-1}\log\left( \sum_y \left( \sum_x P_X(x) \cdot P_{Y|X}(y|x)^\alpha \right)^{1/\alpha} \right)$$
$$~~~~~~~~~~~~ = \frac{\alpha}{\alpha-1} \log \mathbb{E}_Y \left[\exp\left(\frac{\alpha-1}{\alpha}D_\alpha(P_{X|Y}\|P_X)\right)\right].$$
My question: Does Sibson's mutual information satisfy some (weak) form of the chain rule? Specifically, I have the following conjecture.
Conjecture: Let $X$ and $Y$ be independent and let $Z$ depend on $X$ and $Y$ arbitrarily. Let $\alpha>1$. Then $$I_\alpha(X,Y;Z) \le \max_y I_\alpha(X;Z|Y=y) + \max_x I_\alpha(Y;Z|X=x).$$
This conjecture is weaker than the chain rule for the usual mutual information in three ways: First, we take the max over $x$ in the second term, rather than the expectation. Second, we take the max over $y$ in the first term, rather than marginalizing it. Third, we assume $X$ and $Y$ are independent.
I suspect that we can actually prove something stronger than this conjecture (i.e., without all three weakenings).
If we expand out the definition of Sibson's mutual information, then the conjecture above is equivalent to
$$\sum_z \left( \sum_x \sum_y P_X(x) \cdot P_Y(y) \cdot P_{Z|XY}(z|x,y)^\alpha \right)^{1/\alpha}$$ $$\le \left( \max_{y_*} \sum_z \left( \sum_x P_X(x) \cdot P_{Z|XY}(z|x,y_*)^\alpha \right)^{1/\alpha} \right) \cdot \left( \max_{x_*} \sum_z \left( \sum_y P_Y(y) \cdot P_{Z|XY}(z|x_*,y)^\alpha \right)^{1/\alpha}\right)$$
Best Answer
The conjecture is not true in general. A counterexample can be constructed as follows.
Suppose $X, Y, Z$ take values from $\{1, 2, 3, 4\}$, with $$ \begin{align*} P_{Z|X, Y}(z|x, y) = \begin{cases} 1& \text{if $x = y = z$},\\ 0& \text{if $x = y$, $z \neq x$},\\ 1/4& \text{if $x \neq y$}. \end{cases} \end{align*} $$ Suppose $P_X$ and $P_Y$ are arbitrary distributions with positive probability masses. Let $\alpha = \infty$, then
\begin{align*}\text{LHS} &= \sum_z \left( \sum_x \sum_y P_X(x) \cdot P_Y(y) \cdot P_{Z|XY}(z|x,y)^\alpha \right)^{1/\alpha}\\ &= \sum_z \left(\max_{x, y}P_{Z|XY}(z|x,y)\right)\\ &= 4. \end{align*}
For all $y_* = 1, 2, 3, 4$, we have $$ \sum_z \left( \sum_x P_X(x) \cdot P_{Z|XY}(z|x,y_*)^\alpha \right)^{1/\alpha} = \sum_z \left(\max_x P_{Z|XY}(z|x,y_*)\right) = 1 + 3 \cdot \frac14 < 2. $$ By symmetry, for all $x_* = 1, 2, 3, 4$, we have $$\sum_z \left( \sum_y P_Y(y) \cdot P_{Z|XY}(z|x_*,y)^\alpha \right)^{1/\alpha} < 2. $$ Therefore, $\text{LHS} = 4 > \text{RHS}$.
Remarks:
Then, we can again obtain a counterexample for $pK \geq 4$ with $\alpha$ sufficiently large.