Why is $q$-ary entropy defined as such

entropyinformation theory

The q-ary entropy for any $q\in\mathbb{N}$ and $q\geq 2$ and $x\in[0,1]$ is defined as

$$H_q(x)=x \log _q(q-1)-x \log _q(x)-(1-x) \log _q(1-x) .$$

One recovers the binary entropy with $q=2$. What is the motivation behind this definition?

Currently, I understand the binary entropy to be the quantity that tells you the best rate at which one can do source coding. That is, given i.i.d. copies of random binary variable $X$, one needs to compress at the rate $H_2(X)$ bits to optimally do source coding in the asymptotic limit. If $X$ was a random variable on a q-ary alphabet, then does $H_q(X)$ have the same operational meaning?

Best Answer

Of course, things will depend on the precise perspective you take, but here's a pretty natural view from channel coding. Take a $q$-ary symmetric channel, i.e., $$ P(y = x'|X = x) = \begin{cases} 1-\pi & x' = x\\ \pi/(q-1) & x' \neq x\end{cases}.$$

The capacity of this channel is $$ \max_{p_X} H(Y) - H(Y|X).$$ Here $H(Y|X)$ is a constant $$ H(Y|X) = -(1-\pi) \log (1-\pi) + \pi\log(q-1) - \pi \log\pi.$$

Of course, placing the uniform law on $X$ further induces the uniform law on $Y$, so the capacity is $$ \log q - \pi \log(q-1) + \pi \log \pi + (1-\pi) \log(1-\pi).$$ Normalising this by $\log q$, the rate needed to communicate a uniform random variable, we see that the capacity satisfies $$ C = \log q \left( 1 - (\pi \log_q(q-1) - \pi \log_q(\pi) - (1-\pi) \log_q(1-\pi))\right).$$

Compare this to the capacity of a BSC $1 - h_2(\pi)$, and you see the similarity. The point of course is that if you're working with a q-ary alphabet, the q-ary SC is a basic model that you'll deal with often, so it's useful to have a generalisation of $h_2(\pi)$ that is pertinent here.

Related Question