Relative Entropy Equality for Bernoulli Random Variables

entropypr.probabilityprobability distributionsreference-request

We are given two joint probability distributions, $p$ and $q$, of $n$ Bernoulli random variables $X_1, X_2, \ldots, X_n$.

We denote by $p(x_k\mid x^{k-1})$ the probability $\mathbb{P}_p(X_k=x_k\mid X_1=x_1,\ldots, X_{k-1}=x_{k-1})$ and by $q(x_k\mid x^{k-1})$ the probability $\mathbb{P}_q(X_k=x_k\mid X_1=x_1,\ldots, X_{k-1}=x_{k-1})$, where the $\mathbb{P}$ subscript indicates which probability mass function we are referring to. For any integer $1\le m\le n$, $x^{m}$ denotes therefore the realization $x_1,x_2,\ldots,x_{m}$ of $X_1,X_2,\ldots,X_{m}$.


Question: How can we prove that

$$D(p\parallel q)=\sum_{k=1}^{n}\frac{1}{2^{k-1}}\sum_{x^{k-1}\in\{0,1\}^{k-1}}D\left(p\left(x_k\mid x^{k-1}\right)\parallel q\left(x_k\mid x^{k-1} \right) \right)\,,$$

where $D(p\parallel q):=\sum_{x^n\in\{0,1\}^n} \left(p(x^n)\log\left(\frac{p(x^n)}{q(x^n)}\right)\right)$ is the Kullback–Leibler divergence between $p$ and $q$?



Note: I tried to use the chain rule for the Kullback–Leibler divergence, but without success until now. Am I missing something trivial?

Best Answer

This equality does not hold in general. E.g., suppose that $n=2$, $$\big(P_p(X_1=i,X_2=j)\colon\; i=0,1,\,j=0,1\big)= \frac1{30}\,\begin{pmatrix} 4 & 9 \\ 9 & 8 \\ \end{pmatrix},$$ and $$\big(P_q(X_1=i,X_2=j)\colon\; i=0,1,\,j=0,1\big)= \frac1{17}\,\begin{pmatrix} 2 & 6 \\ 2 & 7 \\ \end{pmatrix}.$$

Then the left- and right-hand sides of your conjectured equality are $0.132\ldots$ and $0.118\ldots$, respectively (assuming $\log=\ln$).


Here are the calculations, done in Mathematica:

enter image description here

Related Question