The I-projection of a distribution to a family of distributions

information theoryprobabilityprobability distributionsstatistics

For a finite set $X$, consider on $X \times X$ those distributions whose two marginals are equal. Let $\mathcal{L}$ denote the family of these distributions on $X \times X$.

I would like to determine the I-projection to $\mathcal{L}$ of $\widetilde{Q}= Q_1 \times Q_2$.

I think first I should prove that the I-projection $\widetilde{P}$ has to be of product form, $\widetilde{P} = P \times P$, so that $D(\widetilde{P}||Q_1 \times Q_2)=D(P||Q_1)+D(P||Q_2)$.

I know a theorem that states: If $Q_1, \dots, Q_n$ are arbitrary distributions over the finite sets $X_1, \dots, X_n$, and $P$ be an arbitrary distribution over $X_1 \times \cdots \times X_n$ with marginals $P_1, \dots, P_n$, then $$D(P||Q_1\times \cdots \times Q_n)=D(P||P_1\times \cdots \times P_n) + \sum_{i=1}^n D(P_i||Q_i).$$

Then writing the right side as one sum, I should show via the log sum inequality that its minimum is attained when $P(x)=c \sqrt{Q_1(x)Q_2(x)}$.

Unfortunately I fail at both steps.

Best Answer

If we let $\mathcal{L}_P \subset \mathcal{L}$ be the family of joint distributions on $X \times X$ with marginals $P$, then from the equality you state we can notice that for any $\widetilde{P} \in \mathcal{L}_P$, $$ D( \widetilde{P}\|Q_1 \times Q_2) = D(\widetilde{P}\|P\times P) + D(P\|Q_1) + D(P\|Q_2) \\ \ge D(P\|Q_1) + D(P\|Q_2).$$

Further, $\widetilde{P} = P \times P \in \mathcal{L}_P$ and satisfies the above with equality.

Immediately, we have that the minimiser of $ D(\widetilde{P} \|Q_1 \times Q_2)$ in $\mathcal{L}_P$ must be the product distribution $P \times P$. But this is true of all $P$, and $\mathcal{L} = \bigcup_{P} \mathcal{L}_P$, and so the minimiser of $D(\widetilde{P} \|Q_1 \times Q_2)$ over $\mathcal{L}$ is also a product distribution.

That's the first part done - you'd almost got there. Let's do the second part.

We want to minimise $$ f(P) := D(P\|Q_1) + D(P\|Q_2).$$ I'll do this for discrete $X$, but the argument generalises trivially. $$ f(P) = \sum_x P(x) \log \frac{P(x)^2}{Q_1(x) Q_2(x)} = 2 \sum_x P(x) \log \frac{P(x)}{\sqrt{Q_1(x) Q_2(x)}}$$

Define $R(x) = \frac{\sqrt{Q_1(x) Q_2(x)}}{ Z}$ where $Z = \sum_x \sqrt{Q_1(x) Q_2(x)}$. Notice that $R$ is a distribution.

We can further write $$ f(P) = 2\sum_x P(x) \log \frac{P(x)}{ZR(x)} = -2\log Z + 2D(P\|R).$$

Now, the first term $-2\log Z$ in the above is a constant - it depends on $(Q_1, Q_2),$ but not on our decision variable $P$. The second term is a KL divergence, so it's non-negative. In particular it's minimised at $P = R$, and we're done.

Best Answer

Related Solutions

[Math] Interpretation of Hellinger distance

Related Question