Conditional Entropy of Lossy Channel Output

entropyinformation theoryprobability

I've seen the following claim in several sources, that all treat it as completely obvious, but am unable to prove it, probably due to lack of basic knowledge about probability and information theory, which at present I'm trying to learn on my own.

Claim:

Let $N\in\mathbb{N}$, $N>0$, and $X$ be a random variable in $\{0, 1\}^N$ (possibly further implicit assumptions are needed?). This random variable is transmitted through a lossy channel which has a probability $p_e$ of flipping any individual bit (the bits are treated as independent by this memoryless transmission). The result of the transmission is $Y$. Then the conditional (Shannon) entropy of $X$ given $Y$ is
$$
\begin{equation}
H(X|Y) = Nh(p_e) = N \left(-p_e \log(p_e) – (1-p_e)\log(1-p_e) \right),
\end{equation}
$$

where $h(p_e)$ is the Shannon entropy of a Bernoulli trial with parameter $p_e$.

My Attempt:

I tried to use the definition of conditional entropy:
$$
\begin{equation}
H(X|Y) = -\sum_{x,y\in\{0,1\}^N} p(x,y) \log\frac{p(x,y)}{p(y)}.
\end{equation}
$$

For $x,y\in\{0,1\}^N$ let $k$ be the Hamming distance between $x$ and $y$, i.e., the number of bits where $x$ and $y$ differ. Then $p(x,y)= p(x)\binom Nkp_e^k(1-p_e)^{N-k}$. However, if I try to plug this into the formula and somehow convert the sum into a sum over $k$ (which I'm not sure is correct in the first place), I get something like the entropy of a Binomial distribution, for which no simple formula is known. Clearly I'm doing something wrong. How would you prove the above claim (while possibly adding the "obvious" implicit assumptions)? Thanks in advance!

Best Answer

Denoting $X=(X_1, X_2, \ldots, X_N)$ and similary for $Y$, note that, since the channel is memoryless

$$ \mathbb{P}(X=(x_1,x_2,\ldots,x_N)|Y=(y_1,y_2,\ldots,y_N))=\prod_{i=1}^N \mathbb{P}(X_i=x_i|Y_i=y_i). $$ From fundamental properties of the entropy, it follows that $$ \begin{align} H(X|Y)&=\sum_{i=1}^NH(X_i|Y_i)\\ &=N H(X_i|Y_i), \end{align} $$

You have probably seen the first equality as holding for unconditional entropy, you may want to work on showing why this also holds for conditional entropy. The second equality holds since the channel treats each bit the same. Now, you need to show that $H(X_i|Y_i)=h(p_e)$.