I have two random variables $X$ and $Y$. $X$ follows Poisson-Binomial distribution with parameters $\{q_1, \ldots, q_k\}$. Thus, $X$ can take values in the set $\{0,1,\ldots,k\}$.
$Y$ is a binary random variable. $Y=0$ with probability $\frac{0.1}{1+x}$ when $X=x$. Thus, the transition probabilities $\frac{0.1}{1+x}$ is decreasing in $x$.
I am interested in the mutual information between $X$ and $Y$ denoted by $I(X;Y)$.
Specifically, I have a conjecture that the mutual information $I(X;Y)$
is maximum when all $q_i$'s are equal which leads to a Binomial
distribution for $X$.
Also, my intuition is that any transition probability which is decreasing in $x$ ( Similar to $\frac{0.1}{1+x}$) leads to this conjecture.
Can someone help me prove this? Or provide a counter example?
$\underline{\text{Some useful facts that I noted}}$:
- For a given $\sum q_i$ and $k$, the Binomial distribution has the maximum variance.
- The entropy of Poisson- Binomial distribution is bounded above by the entropy of a binomial distribution with the same $k$ and $\sum q_i$.
I have asked this question in cstheroySE as well to gather attention as the topics on Information Theory don't have a large audience comparatively.
Best Answer
I believe your conjecture is correct! But this relies on the result of Shepp and Olkin referenced on the wiki for the Poisson-Binomial distribution ("Entropy of the sum of independent Bernoulli random variables and of the multinomial distribution", 1981), which claims that the Poisson-Binomial distribution is concave on $[0,1]^n$. I can't access that proof right now, so I'll just assume it holds and move forward.
If we accept this result, then it suffices to locate a critical point of the function, as this critical point must be a global maximum by concavity.
Restricting to the case of a fixed mean $\mu := \sum_{i=1}^n p_i$ for $X$, we want to show that the point $\mathbf{p}^* = (p_1, \dots, p_n) = (\mu/n, \dots, \mu/n)$ describes a critical point of the function $$ I(X;Y) = \sum_{k=0}^n \sum_{\ell=0}^1 P(X=k, Y=\ell) \log\left({P(X=k, Y=\ell) \over P(X=k) P(Y=\ell)} \right). $$ Writing $\mathbf{p} = (p_1, \dots, p_n)$ and $\mathbb{1} = (1, \dots, 1)$, then we are working only with points $\mathbf{p}$ that lie in the plane $M = \{ \mathbf{x} : \mathbf{x} \cdot \mathbb{1} = \mu \}$.
We can simplify the joint distribution, writing $P(X=k, Y=\ell) = P(Y=\ell | X=k) P(X=k)$, so that $$ I(X;Y) = \sum_{k=0}^n P(X=k) \sum_{\ell=0}^1 P(Y=\ell | X=k) \log\left({P(Y=\ell | X=k) \over P(Y=\ell)} \right). $$
Now $P(Y=\ell|X=k)$ is independent of the $p_i$, so directional derivatives of $I(X;Y)$ w.r.t. the $p_i$ will be determined by directional derivatives of the terms $P(X=k)$ and $P(Y=\ell)$. I do this using the directions $e_{i,j}$ which are +1 in the $p_i$-direction and -1 in the $p_j$-direction. In short,
cstheory
discussion: at $\mathbf{p}^* = (\mu/n) \mathbb{1}$, all $\partial_{e_{i,j}} P(X=k)$ must be equal, but $e_{i,j} = - e_{j,i}$ so the derivatives have opposite signs.And we're done!EDIT:
At this point we know that $I(X;Y)$ has a critical point at ${\mu \over n} \mathbb{1}$, but its concavity is not clear from that of $P(X)$. Instead, we use a basic fact about mutual information (mentioned by OP, and with a proof here by Mark Braverman), that with $P(Y|X)$ fixed, $I(X;Y)$ is concave in $P(X)$.
However, $P(Y|X)$ is constant with respect to the $\{p_i\}$, so $I(X;Y)$ is concave (in the $\{p_i\}$).
This completes the proof, without reference to the structure of $Y$'s distribution.
Some Notes: