Maximize (but not minimize?) entropy for probability distribution

entropylagrange multiplieroptimizationprobability distributions

Let $p$ be an arbitrary discrete probability distribution on $\{1, \dots, 10\}$. We can represent such distribution by a vector indexed by by $i$ subject to the constraints $p_i \ge 0$ and $\sum^{10}_{i=1}p_i = 1$. I want to find analytically the $p$ with maximum entropy which is given by

$$
H(p) = – \sum_{i=1}^{10} p_i \log(p_i)
$$

My question consists of two parts:

  1. How do I show that the optimal $p$ is given by the uniform distribution using the Lagrange method.

  2. Why can't the Lagrange method be used to find $p$ with minimal entropy (which I suppose is given by the dirac distribution)?

For 1. I know that the Lagrangian function associated with this constrained problem is given by

$$
\mathcal{L}(p,\lambda) = – \sum_{i=1}^{10} p_i \log(p_i) + \lambda(1 – \sum^{10}_{i=1}p_i)
$$

and the respective gradient by

$$
\frac{\partial \mathcal{L}}{\partial p_i} = – \log(p_i) – 1 – \lambda
$$

and

$$
\frac{\partial \mathcal{L}}{\partial \lambda} = 1 – \sum^{10}_{i=1}p_i
$$

but I don't know how to continue from there.

Best Answer

  1. When applying the Lagrange method, you need to cancel the gradient, which yields $ - \log(p_i) - 1 - \lambda = 0$ and $1 - \sum^{10}_{i=1}p_i = 0$. The first equality yields $p_i =e^{-1-\lambda}$ and the second one yields $\sum^{10}_{i=1}p_i = 1$. So we want to have $\sum^{10}_{i=1} e^{-1-\lambda} = 1$, or, in other words, $10 e^{-1-\lambda} = 1$. Thus, substituting $e^{-1-\lambda} = \frac 1 {10}$ into the expression $p_i = e^{-1-\lambda}$ gives $p_i = \frac 1 {10}$, as we expect.

  2. You cannot use your Lagrangian to find the minimum, because the minima (which are indeed all 10 Dirac distributions) lie at the "boundary" of the constraints $p_i \geq 0$ and $p_i \leq 1$, and your Lagrangian does not know anything about these constraints.

Update 1: the Lagrangian actually knows about the constraints $p_i \geq 0$ because otherwise it is not defined, but it does not know about the constraints $p_i \leq 1$. In fact the Lagrangian method here would minimize over the set $\{(p_1, \dots, p_n)\in {\mathbb R_+^{10}}, \sum^{10}_{i=1}p_i = 1\}$, which is larger than $\{(p_1, \dots, p_n)\in [0,1]^{10}, \sum^{10}_{i=1}p_i = 1\}$, and the Dirac measures are not minimizers over the larger space.

Update 2 (following LinAlg's comment and answer): you could try to change your Lagrangian to take the inequalities into account, see for example these notes, but then you would run into trouble due to the fact that $x\mapsto x\log x$ is not differentiable at 0.