Proof for the mutual information between two random variables

information theoryproof-explanation

I am trying to understand a proof in this paper which asks you to show that the mutual information between two random variables $X$ and $Y$ is a certain quantity.

Here is the definition of the random variables:

  • X is a mixture of a point mass and an exponential distribution. It's values are
    $$ P(X=x) =\begin{cases} \frac{b}{a+b}, x=0 \\ e^{\frac{-x}{a+b}}, x>0 \end{cases} \\
    a,b>0$$
  • Y is $X + N$ where $N$ is an exponential random variable such that
    $$ p_N(t)=\frac{1}{b}e^{\frac{-t}{b}}, t\geq0 $$

The proof involves showing that the mutual information between $X$ and $X+N$ is $\log(1+\frac{a}{b})$.

A very short and succinct proof is given in this paper on p.8, which involves the assumption that $Y$ is exponential (which is apparently easy to prove using the Laplace transform, although I'm not too sure on that point, let's just assume it). How does the proof work, it's only three lines long but I'm having trouble understanding how one gets to initial statement:

$$ I(X;Y) = E[\log{\frac{p_N(Y-X)}{p_Y(y)}}] $$

And how the subsequent statements follow to give $\log(1+\frac{a}{b})$.

Note that I have tried proving it myself, and this is as far as I got:

$$
I(X;Y) = H(Y) – H(Y|X) \\
\text{Use the definition of entropy: } H(Y) = E_Y[-\log(p_{Y}(y))] \\
\therefore I(X;Y) = E_{Y}[-\log(p_{Y}(y))] + E_{X,Y}[\log(p_{X,Y}(y|x))]
$$

I know that we probably use the fact that $Y=X+N$ somehow, but I don't know how we would suddenly switch the numerator to use $p_{N}$ rather than $p_{Y,X}$.

N.B. Please don't judge me too harshly, as I'm still starting out with formal proofs, and I'm finding things incredibly difficult, especially when reading and trying to understand others' proofs. Also, any recommendations on how I can improve my abilities in this area would be very welcome, since I find the kind of mathematics I find in these papers mind-boggling (HOW did they know to do that? WHY did they choose to introduce those variables? etc.).

Best Answer

First of all, the argument below relies on $X$ and $N$ being independent, which I think is the only way this works out.

I am going to proceed from the opposite direction: $$E\left[\log \frac{p_N(Y-X)}{p_Y(y)} \right] = E\left[ \log p_N(Y-X) -\log p_Y(y) \right] = E\left[ \log p_N(Y-X) \right] - E\left[ \log p_Y(y) \right]$$

As you pointed out, since $H(Y) = -E\left[ \log p_Y(y) \right]$, it remains to prove that $E\left[ \log p_N(Y-X) \right] = H(Y|X)$. Observe that $Y = X + N$, so given $X$, the only uncertainty left in $Y$ is due to $N$. More formally, $\forall x \in \mathbb{R}$, the family of distributions $p_{(Y|X=x)}(y) = p_N(y-x)$. Each of these conditional distributions is nothing but the distribution of $N$ with an offset of $x$. This completes the proof because the definition of the entropy only depends on the distribution and is oblivious to the values they are assigned to. So, you can simply write $H(Y|X) = H(N)$, which is exactly what we wanted to show.

Related Question