[Math] Is maximizing entropy equivalent to minimizing the defined variance

information theoryprobabilityprobability theory

Assume there is multi-set of some integers : $D = \{a_1,a_2,\cdots,a_{N-1}\}$ such that $\sum_i a_i = A$ we can build a discrete probability distribution by dividing elements of set by $A$, i.e. $p_i = a_i/A$ for $i=1,2,\cdots,N-1$

We define Entropy and Variance as:

Entropy : $H(P) = -\sum_i p_i \log p_i$

Defined variance : $V(P) = \frac1{N-1} \sum_i (p_i – \frac1{N-1})^2$

Is minimizing the defined variance equal to maximizing entropy for this problem? Intuitively the answer is yes since maximum entropy is obtained when the distribution is uniform and the defined variance is minimum when all variables are equal.

Best Answer

First, your statement is confusing because your "variance" is not the variance of the random variable for which the "entropy" is computed, rather it's the mean squared dispersion of the values of the discrete probability function.

With this caveat, the answer is yes, because with the given restrictions, the distribution that maximes entropy is the uniform distribution, which has zero "variance" ... with that special meaning. But this seems rather trivial and not useful.

Update: I'd missed the condition that $a_i$ must be integer. Then (allow me to use $N$ instead of your $N-1$, it feels more natural), I conjecture that the entropy is maximized by the "closest to uniform" distribution: let $u= \lfloor A/N\rfloor$, $n_u = N(u+1)-A$. Then we have $a_i = u$ for $i=1 \cdots n_u$, $a_i = u$ for $i=1 \cdots n_u$, and $a_i = u+1$ for $i=n_u+1 \cdots N$. Which also minimizes your "variance".

Added: To show that this maximizes the entropy: consider a distribution $\{a_i\}$ where $a_M - a_m >1$ ($a_M, a_m$ are values that attain the maximum and minimum). Show that the process of changing $a_M := a_M -1$, $a_m := a_m+1$ gives other valid distribution with bigger entropy (this can be shown from concavity of the entropy; see also Cover and Thomas, exercise 2.28). Show that repeting this process eventually ends in the distribution given above.

Best Answer

Related Solutions

[Math] Maximum entropy principle for Poisson distribution

Why Shannon’s Entropy Uses Logarithms – Information Theory

Related Question