Why is $q$-ary entropy defined as such

entropyinformation theory

The q-ary entropy for any $q\in\mathbb{N}$ and $q\geq 2$ and $x\in[0,1]$ is defined as

$$H_q(x)=x \log _q(q-1)-x \log _q(x)-(1-x) \log _q(1-x) .$$

One recovers the binary entropy with $q=2$. What is the motivation behind this definition?

Currently, I understand the binary entropy to be the quantity that tells you the best rate at which one can do source coding. That is, given i.i.d. copies of random binary variable $X$, one needs to compress at the rate $H_2(X)$ bits to optimally do source coding in the asymptotic limit. If $X$ was a random variable on a q-ary alphabet, then does $H_q(X)$ have the same operational meaning?

Best Answer

Of course, things will depend on the precise perspective you take, but here's a pretty natural view from channel coding. Take a $q$-ary symmetric channel, i.e., $$ P(y = x'|X = x) = \begin{cases} 1-\pi & x' = x\\ \pi/(q-1) & x' \neq x\end{cases}.$$

The capacity of this channel is $$ \max_{p_X} H(Y) - H(Y|X).$$ Here $H(Y|X)$ is a constant $$ H(Y|X) = -(1-\pi) \log (1-\pi) + \pi\log(q-1) - \pi \log\pi.$$

Of course, placing the uniform law on $X$ further induces the uniform law on $Y$, so the capacity is $$ \log q - \pi \log(q-1) + \pi \log \pi + (1-\pi) \log(1-\pi).$$ Normalising this by $\log q$, the rate needed to communicate a uniform random variable, we see that the capacity satisfies $$ C = \log q \left( 1 - (\pi \log_q(q-1) - \pi \log_q(\pi) - (1-\pi) \log_q(1-\pi))\right).$$

Compare this to the capacity of a BSC $1 - h_2(\pi)$, and you see the similarity. The point of course is that if you're working with a q-ary alphabet, the q-ary SC is a basic model that you'll deal with often, so it's useful to have a generalisation of $h_2(\pi)$ that is pertinent here.

Best Answer

Related Solutions

The difference between KL Divergence and merely subtracting entropy measurements

How do the notions of uncertainty and entropy go together

Low entropy

High entropy

Example

Related Question