[Math] Minimum and Maximum Capacity of a channel

information theory

There is this question in the Cover and Thomas book "Elements of Information Theory".

Noise alphabets: Consider the channel Y = X + Z where X = {0, 1, 2, 3} and Z is uniformly distributed over three distinct integer values Z = {z1, z2, z3}.

(a) What is the maximum capacity over all choices of the Z alphabet?
Give distinct integer values z1, z2, z3 and distribution on X achieving this.

(b) What is the minimum capacity over all choices for the Z alphabet?
Give distinct integer values z1, z2, z3 and distribution on X achieving this.

The solution given was

(a) Maximum capacity is C = 2 bits. Z = {10, 20, 30} and p(X) = (1/4, 1/4, 1/4, 1/4)

(b) Minimum capacity is C = 1 bit. Z = {0, 1, 2} and p(X) = (1/2, 0, 0, 1/2)

I don't quite understand how to arrive at the solution.

My thoughts for this question are as follows.

The maximum capacity $C = max I(X + Z ; Y)$ over $P_x$. Therefore, $C = H(Y) – H(Y|X+Z)$, and $H(Y|X+Z) = 0$. So I should maximize $H(Y)$, but I'm not sure how to do that.

Alternatively, $C = max I(X; Y-Z) = H(X) – H(X|Y-Z)$. Similarly, $H(X|Y-Z) = 0$, so $max H(X)$ becomes $log 4 = 2$, so the maximum capacity is 2 bits. And this can be achieved when $P_x$ is a uniform distribution, i.e. $P_x = (1/4, 1/4, 1/4, 1/4)$. Is my understanding correct?

I don't understand why those values of Z were chosen, and how the minimum capacity is obtained. For example, why can't the capacity be 0 bits? I know it makes no sense to have a channel and not send any bits, but isn't 0 a valid minimum capacity?

Best Answer

Answering your questions on the minimisation:

We can argue from the structure of the problem that the alphabet set of $Z$ must be $(a, a+1, a+2)$, for some integer $a$ (Ask me about this if this is unclear). Set $a = 0$ for convenience, since the decoder will know $a$, and can always subtract it.

Now, given this noise, we need to find the distribution $p_X$ that maximises $I(X;Y)$. I'll try to keep things fully analytic, no jumps of logic, no inspired guesswork.

$I(X;Y) = H(Y) - H(Y|X) = H(Y) - H(X+Z|X)$

$ = H(Y) - H(Z) = H(Y) - \log_2 3$

So we're looking for the distribution of $X$ : $(p_0, p_1, p_3, p_4)$ that maximises H(Y). There's a dirty old analytic way to do this, using lagrange multipliers. (Note, I'll be doing this in natural logs, because they don't leave a dirty constant. This doesn't make a difference to the maximiser.)

Now, the distribution of $Y$ in terms of that of $X$ is as follows (convince yourself of this):

$\left(p_0/3, (p_0 + p_1)/3, (p_0+p_1+p_2)/3, (p_1+p_2+p_3)/3, (p_2+p_3)/3, p_3/3 \right)$

We want to run the optimisation: $$ \max_{(p_0, p_1, p_2,p_3)} \sum_{y=0}^5 -p_y\ln p_y$$ $$\text{Subject to } p_0 + p_1 + p_2 + p_3 = 1$$

Define $J = \sum_{y=0}^5 -p_y\log p_y + \lambda (p_0 + p_1 + p_2 + p_3 - 1)$

At the optimum:

$$\partial_{p_0} (J) = 0 \implies \log \left( (p_0)(p_1+p_0)(p_2+p_1+p_0) \right) = 3\lambda - 3 \tag{1}$$

$$\partial_{p_1} (J) = 0 \implies \log \left((p_1+p_0)(p_2+p_1+p_0)(p_3+p_2+p_1)\right) = 3\lambda - 3 \tag{2}$$

$$\partial_{p_2} (J) = 0 \implies \log \left((p_2+p_1+p_0)(p_3+p_2+p_1)(p_3+p_2)\right) = 3\lambda - 3 \tag{3}$$

$$\partial_{p_3} (J) = 0 \implies \log \left((p_3)(p_3+p_2)(p_3+p_2+p_1)\right) = 3\lambda - 3 \tag{4}$$

$$\partial_{\lambda}(J) = 0 \implies p_0 + p_1 + p_2 + p_3 = 1 \tag{5}$$

This is not so bad to calculate. Look at all terms with $p_0$ in them. When the product $(p_0+p_1)\log(p_0+p_1)$ is differentiated, differentiating the log will leave you with a 1, and differentiating the outer $p_0 + p_1$ will leave you with just the log.

Now, $(1) - (2) = 0$. This implies:

$$\log \frac{p_0}{p_1 + p_2 + p_3} = 0 \iff p_0 = p_1 + p_2 + p_3 \tag{$\alpha$}$$

$(4) - (3) = 0$. This implies:

$$\log \frac{p_3}{p_0 + p_1 + p_2 } = 0 \iff p_3 = p_0 + p_1 + p_2 \tag{$\beta$}$$

Solve out the two equations above, and feed them into equation $(5)$ to get $p_0 = p_3 = 0.5$

At this, the distribution of $Y$ is uniform over 6 characters.

Therefore, $C = \max\limits_{p_X} I(X;Y) = \log_2 6 - \log _2 3 = 1$. Phew.


Try to see what this is structurally doing. Suppose $X$ was uniform, then under this noise, I would be able to tell when the input was $0$ or $3$ to much greater accuracy than if it was $1$ or $2$. Why? Because whenever $Y= 0$ I know that the input is $0$, and similarly for $Y=5, X=5$. So a 'good' distribution for communication under this noise would be weighed more at $0$ and $3$.


To answer the other questions you raised:

  1. Capacity can be zero, sure. For example, take $Z$ to be uniform over $\{ 1, 2, \dots ,N\}$. Then the capacity heads to $0$ as $n \to \infty$

  2. What you're trying to do here is this: You know that your noise is uniform over 3 letters. You don't know which letters. Suppose the noise was the worst possible. At what rate would I still be able to communicate? This is the minimum capacity. It's the smallest of the maximum rates at which I could talk subject to varying noise.

  3. (From your comments) You've calculated $H(Y|X)$ incorrectly.

$H(Y|X) = H(X+Z|X) = H(Z|X) = H(Z)$

Where the second equality comes from the fact that if I'm given $X$, I can always subtract it from any other random variable with 100% accuracy, and the last equality because $Z$ and $X$ are independent.

Related Question