[Math] What does entropy capture that variance does not

entropyprobabilityvariance

Consider a discrete distribution like this one: $[0.6,0.15,0.1,0.08,0.05,0.02]$

Its entropy is $-\sum p_i\log p_i = 1.805$
, and its variance is $\frac{\sum_i(p_i – \bar{p})^2}{n} = 0.039188$

They both measure the spread of this distribution. For distributions like this that are far from uniform, what information does one capture that the other does not?

Best Answer

You have to be more careful with what your outcomes are and what their probabilities are. From what I see you have 6 outcomes, let's call them $x_1,\dots,x_6$, with probabilities $p_1,\dots,p_6$ given in your list.

The outcomes can have cardinal values, e.g. throwing an (unfair) dice -> $x_1 = 1,\dots, x_6 = 6$. They can also be nominal, such as ethnicity -> $x_1 =$ black, $x_2 =$ caucasian etc.

In the first case, it makes sense to define mean and variance $$ \overline x = \sum_{i=1}^{6} p_ix_i, \qquad \mathbb V = \sum_{i=1}^{6} p_i (x_i-\overline x)^2. $$ The variance measures the (quadratic) spread around the mean. Note, that this definition is different from yours.

In the second case, mean and variance do not make any sense, since you cannot add black to caucasian or scale them, square them etc.

The entropy, on the other hand, can be defined in both cases! Intuitively, it measures the uncertainty of the outcome.

Note that, as Mike Hawk pointed out, it does not care what the outcomes actually are. They can be $x_1 = 1,\dots, x_6 = 6$ or $x_1 = 100,\dots, x_6 = 600$ or ($x_1 =$ black, $x_2 =$ caucasian etc.), the result will only depend on the probabilities $p_1,\dots,p_6$. The variance on the other hand will be very different for the first two cases (by the factor of 10000) and not exist in the third case.

Your definition of variance is very unconventional, it measures the spread of the actual probability values instead of the outcomes. I think that theoretically this can be made sense of, but I very much doubt that this is the quantity you wish to consider (especially as a medical doctor).

It is definitely not meaningful to compare it to entropy, which measures the uncertainty of the outcome. The entropy is maximal if all outcomes have equal probability $1/6$, whereas this would yield the minimal value 0 for your definition of variance...

Hope this helps.