Probability – Expected Entropy Based on Dirichlet Distribution

The Dirichlet Distribution basically defines the probability that a sample came from a particular multinomial distribution if we assume that the prior probability of all multinomial distributions having generated the sample are equal.

Each multinomial distribution has a corresponding categorical distribution, and the entropy of that categorical distribution is given by

$$-\sum_x^{states}\Pr(x)\ln(\Pr(x))$$

Given a point $p=(p_1,p_2,p_3…p_n)$ randomly chosen according to a Dirichlet Distribution with parameters $k_1…k_n$, such that $\sum_ip_i=1$, the entropy of the corresponding categorical distribution is:

$$H(p)=-\sum_i^n p_i \ln(p_i)$$

What the expected value of $\text H(p)$?

In the special case where the Dirichlet Distribution is just defined by $k_1$ and $k_2$ and $p$ is 2-dimensional, the expected entropy $\text{H}(p)$ is given by the formula
$$\frac{(k_1+k_2) H_{(k_1+k_2-1)}-k_1 H_{k_1}-k_2 H_{(k_2-1)}}{k_1+k_2}$$

Where $H_n$ is the $n$th harmonic number, however I haven't been able to calculate the answer for greater numbers of dimensions.

Best Answer

Here is a formal proof for general Dirichlet distributions $(\alpha_1, \dots, \alpha_m)$. I use capital $P_i$ to indicate that we are working with random variables.

$$-E(\sum_i P_i \log P_i)=-\sum_i E(P_i \log P_i)$$ then $P_i \sim Beta (\alpha_i, A -\alpha_i)$ and working with the normalizing constant you can write

$$ -E_{\alpha_i, A-\alpha_i}(P_i \log P_i)= \frac{\alpha_i}{A}E_{\alpha_i+1, A-\alpha_i}(\log P_i) = \frac{\alpha_i}{A} [\psi_0 (A+1)-\psi_0(\alpha_i+1)] $$

where the last step arises by a known result (see Wikipedia page on Beta distributions): if $X \sim Beta(\alpha, \beta)$ then $-E(\log P_i)= \psi_0(\alpha+\beta)-\psi_0(\alpha)$.

Summing over $i$ provides your general formula.

Best Answer

Related Solutions

[Math] How Entropy scales with sample size

Why is the entropy of a posterior Gaussian distribution higher than its prior

Related Question