Is there a way to find the maximum entropy distribution with certain values for the three first moments, when the support is the set of real numbers? Or, without loss of generality, with mean zero, unit variance and skewness $\gamma$? How can I do that?
Probability – Maximum Entropy Distribution with Given Mean, Variance, and Skewness
entropymaximum-principleprobability distributions
Related Solutions
- Discrete case
In the discrete case you need to consider the functional
$$H[p]=-\sum_{i=1}^n p_i \ln(p_i)+\lambda(\sum_{i=1}^n p_i-1)$$
as we consider a single constraint.
Setting $\frac{\partial H[p]}{\partial p_i}=0$ for all $i=1,\dots,n$ we arrive at
$$-\ln(p_i)-1+\lambda=0\Leftrightarrow p_i=e^{\lambda-1}.$$
Imposing $\sum_{i=1}^n p_i-1=0$, one gets
$\lambda=1-\ln(n)$, or $p_i=e^{1-\ln(n)-1}=\frac{1}{n}$.
In summary, the wished distribution is the uniform probability distribution.
- Continuous case
The continuous case needs more care, due to non trivial integration range. We want to maximize the functional
$$H[p]=-\int_{0}^{\infty}p(x)\ln(p(x))dx+\lambda(\int_{0}^{\infty}p(x)dx-1), $$
where $p$ has support $[0,\infty]$ and $p(0)=p(\infty)=0$. We apply the calculus of variations by considering any distribution $\phi$ s.t. $p(0)=p(\infty)=\phi(0)=\phi(\infty)$. We compute the variation
$$\frac{\delta H}{\delta\phi}|_{p}=\lim_{\epsilon\rightarrow 0} \frac{H[p+\epsilon\phi]-H[p]}{\epsilon}=\lim_{\epsilon\rightarrow 0}\frac{1}{\epsilon}\left[\int_{0}^{\infty}\left(F(p+\epsilon\phi,x)-F(p,x)\right)dx+ \lambda(\int_{0}^{\infty}\epsilon\phi dx)\right],$$
where $F(p,x)=-p(x)\ln(p(x))$ and $F(p+\epsilon\phi,x)=-(p(x)+\epsilon\phi)\ln(p(x)+\epsilon\phi)$.
Using
$$F(p+\epsilon\phi,x)-F(p,x)=\epsilon\phi\frac{\partial F}{\partial p}(p,x)+O(\epsilon^2)$$
we have
$$\frac{dH}{d\phi}|_{p}=\int_{0}^{\infty}\left(\frac{\partial F}{\partial p}(p,x)+\lambda\right)\phi dx$$
where $\frac{\partial F}{\partial p}(p,x)=-\ln(p(x))-1$. In summary
$$-\ln(p(x))-1+\lambda=0 $$
or $p(x)=e^{\lambda-1}$, with $\int_0^{\infty}e^{\lambda-1}dx=1,$ which is not possible.
Roughly speaking, the absence of additional constraints like the fixed mean one
$$\int_{0}^{\infty} xp(x)dx=\mu$$
does not allow to arrive at "more interesting" differential equations for $p(x)$. Note that the $F$ does not depend on $p'(x)$: this leads to the simplified Euler Lagrange equation
$$\frac{\partial F}{\partial p}(p,x)+\lambda=0.$$
Solving the Lagrange Equations, we get that the maximum entropy distribution with mean $0$ and variance $1$ is where $$ \sum_{k\in\mathbb{Z}}(k^2-1)e^{-ak^2}=0 $$ which is $a\doteq0.4999998943842821\sim\frac12$. We need to compute the coefficient where $$ c\sum_{k\in\mathbb{Z}}e^{-ak^2}=1 $$ which is $c\doteq0.3989422361322933\sim0.3989422804014327=\frac1{\sqrt{2\pi}}$.
Thus, the maximum entropy distribution on the integers that has a mean of $0$ and variance of $1$, is $$ p_k=c\,e^{-ak^2} $$ where $a$ and $c$ are given above. These values are extremely close to the Gaussian, which has the maximum entropy for a continuous distribution with the same constraints.
Although the function derived above is very close to the Gaussian distribution restricted to $\mathbb{Z}$, $\frac1{\sqrt{2\pi}}e^{-n^2/2}$ is not a probability measure on $\mathbb{Z}$. In fact, the Poisson Summation Formula says that $$ \begin{align} \frac1{\sqrt{2\pi}}\sum_{n\in\mathbb{Z}}e^{-n^2/2} &=1+2\sum_{n=1}^\infty e^{-2\pi^2n^2}\\ &\gt1 \end{align} $$
Best Answer
When the support $S=\mathbb{R}$, there is no maximum entropy distribution with mean $\mu\in\mathbb{R}$, variance $\sigma^2\in\mathbb{R}_{>0}$ and skewness $\gamma\in\mathbb{R}_{\neq 0}$. You can always find distributions that satisfy those constraints, but the set of different entropies that they can take on is open, and therefore doesn't have a maximum value.
For $S=[-a, +a]$ and for some sufficiently large $a\in\mathbb{R}$, there is a maximum entropy distribution $\mathcal{P}(a,\mu,\sigma^2,\gamma)$ with mean $\mu$, variance $\sigma^2$ and skewness $\gamma$, and it will have a probability density function on the form $f(x) = Z^{-1}e^{\lambda_1 x^1 + \lambda_2 x^2 + \lambda_3 x^3}$. Assuming that $\gamma\neq 0$, this distribution will have a strictly smaller entropy than the maximum entropy distribution with the same mean and variance but with zero skewness, which is $\mathcal{N}(\mu,\sigma^2)$, and for $a - |\mu| \gg \sigma$, $\mathcal{P}(a,\mu,\sigma^2,\gamma)$ will look practically like $\mathcal{N}(\mu,\sigma^2)$, with a small amount of its probability mass taken and pressed against one of the interval endpoints. And in the limit as $a\to\infty$, $\mathcal{P}(a,\mu,\sigma^2,\gamma)$ will approach $\mathcal{N}(\mu,\sigma^2)$ and therefore its entropy will approach that of $\mathcal{N}(\mu,\sigma^2)$.
A probability distribution with a small amount of its probability mass taken and moved far away from the distribution center in order to give rise to a skewness was not was I had in mind, though. (Bad distribution!) Considering the fact that even though the skewness of this distribution has the value that was specified, but higher order moments will approach either $\infty$ or $-\infty$ as $a\to\infty$, it becomes apparent that to get something that is more reasonable, some of the higher order moments need to be constrained as well. Constraining the kurtosis (the fourth standardized moment) to a maximum value makes the problem solvable when $S=\mathbb{R}$ and introduced an $x^4$ term with a negative coefficient in the exponent, which keeps the probability density function bounded and makes all higher order moments well defined as well.