Understanding mutual information derivation

information theorystatistics

The mutual information between the joint and marginal gives this proof:

$$I(X;Y) = D(p(x,y) || p(x)p(y))\\
… \\
\sum_{x,y}p(x,y) log p(x,y) – \sum_{x,y}p(x,y) log p(x) – \sum_{x,y}p(x,y) log p(y)
$$

Now the proof turns each component into entropy:

$$\sum_{x,y}p(x,y) log p(x) = H(X) \\
\sum_{x,y}p(x,y) log p(y) = H(Y)
$$

I don't understand how that happens for these two. For $H(X)$, what's inside the log doesn't match what's right next to it… therefore it isn't the same distribution?…

Best Answer

Observe that by the law of total probability $\sum_y p(x,y)=p(x)$, hence \begin{align*} \sum_{x,y} p(x,y) \log p(x) &= \sum_{x} p(x) \log p(x)\\ &=-H(X) \end{align*} The same happens for the other one.

Related Solutions

[Math] Positivity of Renyi Mutual Information

EDIT. I justify the positivity of the Renyi mutual information using its interpretation as Renyi divergence. I follow the expositions in

T. Cover, J. A. Thomas "Elements of Information Theory" (chapter 2)

and

D. Xu, D. Erdogmuns "Renyi's Entropy, Divergence and their Nonparametric Estimators"

Shannon entropy and mutual information

In the setting of "classical" information theory the mutual information $I(X,Y)$ of the random variables $X$ and $Y$ is defined as

$$I(X,Y):=D_{KL}(p_{XY}||p_Xq_Y),$$

where $D_{KL}(p_{XY}||p_Xq_Y),$ denotes the Kullback Leibler divergence (KL divergence) between the joint probability $p_{XY}$ and the product $p_Xq_Y$ of the prob. distribution of $X$ and $Y$.

Using the Jensen inequality on the KL divergence it follows that $I(X,Y)$ is always non negative. I refer to the first reference for the computation in the discrete case.

Introducing the Shannon entropies $H(X)$ $H(Y)$ of $X$ resp. $Y$ and the conditional entropy $H(X|Y)$ we arrive at the equivalent formulation

$$I(X,Y)=H(X)+H(Y)-H(X|Y).$$

Renyi Entropy and mutual information

Let us consider the Renyi $\alpha$-setting , now. With

$$H_{\alpha}(X)=\frac{1}{1-\alpha}\log\int p^{\alpha}_X(x)dx$$

we denote the Renyi entropy of the r.v. $X$. The Renyi divergence of the distribution $g(x)$ from the distribution $f(x)$ is

$$D_{\alpha}(f||g):=\frac{1}{\alpha-1}\log\int f(x)\left(\frac{f(x)}{g(x)}\right)^{\alpha-1}dx.$$

It can be proved that (please see the second reference at pag.81)

$$D_{\alpha}(f||g)\geq 0 \forall ~f, g, \text{and}~\alpha>0,~~(*)$$ $$\lim_{\alpha\rightarrow 1}D_{\alpha}(f||g)=D_{1}(f||g)=D_{KL}(f||g).~~(*)$$

The Renyi mutual information $I_{\alpha}(X,Y)$ is defined naturally as the Renyi divergence between the joint distribution $p_{XY}$ of $X$ and $Y$ and the product of the marginal distributions $p_X$, $q_Y$, i.e.

$$I_{\alpha}(X,Y):=D_{\alpha}(p_{XY}||p_Xq_Y).$$

This is a definition; you can find it, for example, at pag. 83 in the second reference. You can justify it through the overall $\alpha$-setting and the limit

$$\lim_{\alpha\rightarrow 1}I_{\alpha}(X,Y)=I(X,Y),$$

which follows from property $(**)$ of the Renyi divergence. This limit is parallel to the fundamental $\lim_{\alpha\rightarrow 1}H_{\alpha}(X)=H(X):$

From property $(*)$ one derives nonnegativity of the Renyi mutual information.

For these reasons, I would prove non negativity of the Renyi mutual information through the above definition. At the present stage I haven't been able to prove that

$$I_{\alpha}(X,Y)=H_{\alpha}(X)+H_{\alpha}(Y)-H_{\alpha}(X|Y),$$

or to find such characterization in the literature. Even in the discrete case I got blocked because of the coefficient $\frac{1}{1-\alpha}$ in front of the entropies. The cases $0<\alpha<1$ and $\alpha>1$ must be studied separately and it seems that a straightforward application of Jensen's inequality is not possible.

[Math] Why is the mutual information nonzero for two independent variables

I believe you were on the correct path but you did a small mistake while calculating the joint entropy. There will be 100 unique pairs of symbols so the joint entropy will be $\log 100$, that will make the mutual information equal to zero.

Best Answer

Related Solutions

[Math] Positivity of Renyi Mutual Information

[Math] Why is the mutual information nonzero for two independent variables

Related Question