Entropy rate of a finite repetition

entropyinformation theory

Suppose we generate the following sequence by repeatedly picking a letter from the alphabet (26 letters):

LLL EEE HHH QQQ MMM QQQ OOO TTT EEE YYY XXX GGG…

So the first letter of every group of 3 letters is drawn independently with a probability of $\frac{1}{26}$ and the next two are then deterministic and are equal to the letter drawn at the first position of each group.

The entropy of every first symbol of the sequence seems to me $\log26$ (we use base 2) as we need $\log26$ bits and $0$ the two letters after it, as they don't provide any new information and I don't need any new bits.

So with adding a new letter in the second sequence (in the example, the letter "E"), I would need $2$ times $\log26$ bits to describe 6 letters in the sequence, so I could eventually compress my sequence with a factor of $3$ and thus my entropy rate would be $\frac{\log26}{3}$.

But I'm lost with the theoretical derivation of it.
It says the entropy rate is defined as:

$$H(X) = \lim_{n->\infty} \frac{1}{n}H(X_1, X_2, X_3, \dots X_n)$$

Then it says that $H(X)$ is a measure of the entropy per symbol of the variable $X_t$, and the function goes like:

$$\log26 + 0 + 0 + \log26 + 0 + 0+\dots+ \log26 + 0 + 0$$

Now it says this function is "squashed" between two bounds:

$\frac{\log26}{3}n \le H(X_1, X_2, X_3, \dots X_n) \le \frac{\log26}{3}(n+2)$

Dividing both bounds with $n$ I see that the entropy rate is indeed $\frac{\log26}{3}$.

But could someone explain the intuition behind the function $H(X)$ and how these bounds are established?

Best Answer

Sketch:

In general, $n$ can be written as $$n = 3 m + r $$ where $m,r$ are non-negative integers and $r \in \{1,2, 3\}$ is the remainder of $n$ divided by $3$, plus one.

Then $$H_n = (m +1)H(X) = \left(\frac{n+3-r}{3}\right) H(X)$$

Can you go on from here?

Related Question