Entropy and Joint Probabilty Distributions for iid random variables

entropyinformation theoryprobability distributions

The question is the following:

Given two discrete iid random variables $X_1$ and $X_2$ with a probability distribution $p$, Show that: $P(X_1=X_2) \geq \frac{1}{2^{H(p)}}$. Also show that this is an equality iff $p$ is uniform.

Using these lecture notes, I was able to reach this (trivial) step:

$$ P(X_1 = X_2) \geq 2^{-H(p)} $$
$$ P(X_1 = X_2) \geq 2^{\mathbb{E}[\log(p)]} $$

I really don't know where to go from here, How do I get rid of the $\log (p)$? Is this related to Conditional Entropy or Relative Entropy at all? I've seen lots of resources on that online but couldn't relate that information to this question.

I don't even understand this on a basic level. Intuitively, if $p$ is discrete and uniform with $n$ possible values, then $P(X_1=X_2) = \frac{1}{n}$. I see no way in which the formula can be manipulated to transform $2^{\mathbb{E}[\log(p)]}$ into $\frac{1}{n}$.

Feeling like an idiot and I'd love some guidance.

Best Answer

Below is an expanded version of the proof given in Elements Of Information Theory by T.M Cover (2nd Edition, pg 40 - Lemma 2.10.1) for the inequality in question.

If $X_1$ and $X_2$ are iid (i.e. $X_1 \sim p(x)$ and $X_2 \sim p(x)$ )

\begin{align} P(X_1 = X_2) &= \sum_{x \in \mathcal{X}} P(X_1 = x , X_2 = x) \\ &= \sum_{x \in \mathcal{X}} P(X_1 = x) P(X_2 = x) \\ &= \sum_{x \in \mathcal{X}} p(x) p(x) \\ &= \sum_{x \in \mathcal{X}} p^2(x) \end{align}

Using Jenson's Inequality we know that if a function, $f(y)$, is convex then $\mathbb{E}_{p(x)} \left[ f(y) \right] \geq f\left( \mathbb{E}_{p(x)}[y] \right)$, with equality if and only if $\textbf{y}$ is a constant.

The function $f(y) = 2^{y}$ is convex and so if $z = \mathbb{E}_{p(x)} \left[\phi \right]$, then

$$f(z) = f(\mathbb{E}_{p(x)} \left[\phi \right]) \leq \mathbb{E}_{p(x)} [f (\phi) ]$$

Where

$$\mathbb{E}_{p(x)} [f (\phi) ] = \sum _{x \in \mathcal {X}} p(x) f(\phi)$$

Let $\phi = \log_2 p(x)$. Note that

\begin{align} 2^{-H(X)} &= 2^{\mathbb{E}_{p(x)}[\log _2 p(x)]} \\ &= f(y) |_{y = \mathbb{E}_{p(x)}[\log _2 p(x)]} \\ &= f(y) |_{y = \mathbb{E}_{p(x)}[\phi]} \\ &= f(\mathbb{E}_{p(x)}[\phi]) \\ &\leq \mathbb{E}_{p(x)} [f (\phi) ] \\ &= \mathbb{E}_{p(x)} 2^{\phi} \\ &= \mathbb{E}_{p(x)} 2^{\log _2 p(x)} \\ &= \mathbb{E}_{p(x)} p(x) \\ &= \textstyle{\sum} p(x)p(x)\\ &= \textstyle{\sum} p^2 (x) \\ &= P(X_1 = X_2) \end{align}

What this means is $P(X_1 = X_2) \geq 2^{-H(X)}$, with equality if and only if $\log p(x)$ is a constant. If $\log p(x)$ is constant then $p(x)$ is also constant and so the distribution in that case is uniform. $\square$

Best Answer

Related Solutions

Differential Entropy and “Limiting density of discrete points”

Maximize (but not minimize?) entropy for probability distribution

Related Question