[Math] Maximum entropy principle for Poisson distribution

entropyprobability

I know that certain probability distributions may be derived from the requirement that entropy be maximal along with a constraint such as fixed variance. In the case of fixed variance, for example, one finds the normal distribution. In particular, the maximisation is over the set of all (!) continuous PDFs with that fixed variance.

Now my question is, is there a similarly general derivation of the Poisson distribution as a maximum entropy distribution? E.g. fixing that mean and variance are equal and maximising entropy? I have found a couple of articles but they always seem to prove maximality on a restricted set of discrete PDFs. Is it because there is no more general maximum entropy principle for the Poisson distribution? If so, is it because the discrete case is simply more complex than the continuous one?

Best Answer

I believe the second paper you cited (by Harremoës) is actually the answer you're looking for. The Poisson distribution describes the number of occurrences of an event in a fixed interval, under the assumption that occurrences are independent. In particular, the constraint that the events should be independent means that not every discrete distribution is a valid candidate for describing this system, and motivates the choice of the union of infinite Bernoulli variables. Then, Harremoës shows that if you further constrain the expected value (i.e., $\lambda$), then the maximum entropy distribution is the Poisson distribution.

So, the Poisson distribution is the maximum entropy distribution given constraints of counting independent events and having a known expected value.

That said, you can also easily reverse-engineer a (contrived) constraint for which the Poisson distribution would be the maximum entropy distribution.

Let our unknown constraint be $\mathbb{E}[f(k)] = c$. Maximizing the entropy with this constraint, along with the mean being $\lambda$, gives the minimization problem

$\sum_k p(k) \ln p(k) - \alpha \left( \sum_k p(k) - 1\right) - \beta\left(\sum_k k p(k) - \lambda\right) - \gamma \left( \sum_k p(k)f(k) - c \right)$,

where $\alpha$, $\beta$, and $\gamma$ are Lagrange multipliers. Taking the derivative with respect to $p(k)$ yields

$\ln p(k) = -1 + \alpha + \beta k + \gamma f(k)$,

We already know the Poisson distribution has the form $p(k) = e^{-\lambda}\lambda^k/k!$, or $\ln(p(k)) = -\lambda + k \ln(\lambda) - \ln(k!)$. Therefore, we can guess that $f(k)$ has the functional form $\ln(k!)$.

So, the Poisson distribution maximizes entropy when $p$ has mean $\lambda$ and $\mathbb{E}(\ln k!) = $[some particular value depending on $\lambda$].

This approach may not be very satisfying, since it's not clear why we would want a distribution with a specified expectation value of $\ln k!$. The Johnson paper you cited is (in my opinion) similarly unsatisfying, since it essentially proves that the Poisson distribution is the maximal entropy distribution among distributions which are "more log-convex than the Poisson distribution".