[Math] Conditional Entropy: if $H[y|x]=0$, then there exists a function $g$ such that $y=g(x)$

entropyinformation theory

Suppose that the conditional entropy $H[Y|X]$ between two discrete random variables $x$ and $y$ is zero.

Show that, for all values of $x$ such that $p(x) > 0$, the variable $y$ must be a function of $x$, in other words, for each $x$ there is only one value of $y$ such that $p(y|x)\ne 0$

Therefore, if $H[y|x]=0$, then there exists a function $g$ such that $y=g(x)$.

Entropy Venn Diagram

Would it be sufficient to prove this by looking at Venn Diagram and noticing that if $H[Y|X]=0$, then:

$$H[y] = H[x] – H[x|y]$$

Therefore, $H[y|x]=0$ if and only if the value of $y$ is completely determined by the value of $x$.

I want to say that $H[x|y]$ depends only on $x$ variable and not on variable $y$… but not sure how to prove it.

I guess, the question is: how to make such a bold statement like above. iff $H[y|x]=0$, $y$ is a function of $x$!

Best Answer

Write $\mathcal{X}$ and $\mathcal{Y}$ for the range $X$ and $Y$, respectively. Write $H[Y|X]$ as

$$ H[Y|X] = \sum_{x\in\mathcal{X}} \sum_{y\in\mathcal{Y}} \left( -p(x) p(y|x) \log p(y | x) \right) \tag{1}$$

Here, $0 \log 0$ is regarded as $0$ as usual. Now assume that $H[Y|X] = 0$. Since each summand of $\text{(1)}$ is non-negative, this implies that each summand vanishes, and so, we get

$$ p(y|x) \log p(y | x) = 0 \quad \text{whenever} \quad p(x) > 0. $$

But since the function $t \mapsto t \log t$ on $[0, 1]$ has exactly two zeros $t = 0$ and $t =1$, this implies that

$$ p(y|x) \in \{0, 1\} \quad \text{whenever} \quad p(x) > 0. $$

Still assuming that $x$ satisfies $p(x) > 0$, the identity $\sum_{y\in\mathcal{Y}} p(y|x) = 1$ ensures that there exists a unique $y \in \mathcal{Y}$ such that $p(y|x) = 1$ holds. In such case, we write $y = g(x)$. We also remark that $p(y'|x) = 0$ if $y' \neq g(x)$.

To extend $g$ to a function on all of $\mathcal{X}$, assign to $g(x)$ an arbitrary value in $\mathcal{Y}$ whenever $p(x) = 0$. (Such assignment will never harm the argument.) This yields a function $g : \mathcal{X} \to \mathcal{Y}$. We claim that

Ciaim. $Y = g(X)$ with probability one.

Indeed, recall that $g$ is defined so as to satisfy

$$ p(y|x) = \mathbf{1}_{\{y = g(x)\}} := \begin{cases} 1, & \text{if $y = g(x)$} \\ 0, & \text{if $y \neq g(x)$} \end{cases} \tag{2} $$

whenever $p(x) > 0$. (Here, indicator function notation is used.) From this,

\begin{align*} \mathbb{P}(Y = g(X)) &= \mathbb{E}[\mathbf{1}_{\{Y = g(X)\}}] \\ &= \sum_{x\in\mathcal{X}} \sum_{y\in\mathcal{Y}} p(x,y) \mathbf{1}_{\{y = g(x)\}} \\ &= \sum_{x\in\mathcal{X}} p(x)\sum_{y\in\mathcal{Y}} p(y|x) \mathbf{1}_{\{y = g(x)\}} \\ &\stackrel{\text{(2)}}= \sum_{x\in\mathcal{X}} p(x)\sum_{y\in\mathcal{Y}} p(y|x) \\ &= \sum_{x\in\mathcal{X}} \sum_{y\in\mathcal{Y}} p(x, y) \\ &= 1. \end{align*}

Related Question