What’s the maximum entropy for a discrete distribution with non negative support and a given mean and variance

entropylagrange multipliernumerical methodsprobability

I've seen this question, but it doesn't actually give an answer in its answer, merely a text citation. I also know how to set up this problem, but the barrier is getting something to use as a solution.

If I want to find the maximum entropy distribution given the constraints above, I would set up the following function, given $\mu$ is the mean and $\sigma^2$ is the variance:

$$G = -\sum_{i=0}^{\infty}p_i\ln(p_i) + \lambda\left(\sum_{i=0}^{\infty}p_i-1\right)+\kappa\left(\sum_{i=0}^{\infty}i\cdot p_i-\mu\right)+\zeta\left(\sum_{i=0}^{\infty}i^2\cdot p_i-(\sigma^2+\mu^2)\right) $$

When I take the partial derivative with respect to $p_i$, I obtain:

$$p_i = e^{\zeta i^2+\kappa i +\lambda -1}$$

But when I start trying to do math to find the Lagrange multipliers, I end up getting formulae that simply do not produce anything workable. It's not merely that it doesn't seem to give me an analytical solution: it's that it doesn't even want to give me a transcendental one I can approximate numerically with any ease. Does there exist an analytical solution for the Lagrange multipliers? If not, what formulae can I use to give a precise numerical approximation for arbitrary $\mu$ and $\sigma$?

Best Answer

You are trying to find the constants $a, b, c$ with the constraints $$\sum_n p_n = 1$$ $$\sum_n n\cdot p_n = \mu$$ $$\sum_n n^2 p_n = \sigma^2 + \mu^2$$

where the sums are over the nonnegative integers and $p_n = e^{a n^2+bn +c}$. For the sums to be convergent, it must be true that $a< 0$. From your question, note that I made the replacements $a = \zeta, b = \kappa, c = \lambda-1$.

From the first constraint, $c$ can be expressed in terms of $a$ and $b$ as $$c = -\ln\left(\sum_n e^{a n^2 + b n}\right)$$

The remaining two constraints can be expressed as $$e^c\sum_n n\cdot e^{a n^2+bn} = \mu$$ $$e^c\sum_n n^2 e^{an^2 + bn} = \sigma^2 + \mu^2$$

From here, I would suggest using gradient descent for a numerical solution. Specifically, let $G(a, b)$ be the matrix $$\left(\begin{array} \\ e^c\sum_n n e^{an^2 + bn} - \mu \\ e^c\sum_n n^2 e^{an^2 + bn} - \left(\sigma^2 + \mu^2\right) \end{array}\right)$$

The function to minimize is then $\frac{1}{2}G^T(a, b)G(a, b)$, which is equal to $$F(a, b) = \frac{1}{2}\left( \left( e^c\sum_n n e^{an^2 + bn} - \mu \right)^2 + \left( e^c\sum_n n^2 e^{an^2 + bn} - \left(\sigma^2 + \mu^2\right) \right)^2 \right)$$

Then, given $(a_n, b_n)$, generate $(a_{n+1}, b_{n+1})$ using $$(a_{n+1}, b_{n+1}) = (a_n, b_n) - \gamma_n \nabla F(a_n, b_n)$$

In this case, $\nabla F(a, b)$ will be equal to $J_G^T(a, b) G(a, b)$ where $J_G^T(a, b)$ is $$\left(\begin{array} \\ e^c\sum_n n^3 e^{an^2 + bn} & e^c\sum_n n^4 e^{an^2 + bn} \\ e^c\sum_n n^2 e^{an^2 + bn} & e^c\sum_n n^3 e^{an^2 + bn} \end{array}\right)$$

For the initial guess of $(a_0, b_0)$, I would recommend using the values in the normal distribution. That is, $a_0 = -\frac{1}{2\sigma^2}, b_0 = \frac{\mu}{\sigma^2}$. If you keep repeating the gradient descent, you will find a better numerical solution.

To approximate $\sum_n n^k e^{an^2 + bn}$, which is necessary in many places, you could just sum up to some maximal $n$. A better way however, would be to use the Euler-Maclaurin summation formula. It will be approximately equal to $$\int_0^{\infty}x^k e^{ax^2 + bx}dx - \sum_{m=1}^p \frac{B_{2m}}{(2m)!} f^{(2m-1)}(0) $$

where $f(x) = x^k e^{ax^2 + bx}$ and $p$ is a nonnegative integer. The integrals were all evaluated in "closed form" (they used the error function) in Mathematica, but the results are long for $k \ge 2$.