[Math] Minimise the entropy of a probability vector using Lagrange multipliers

optimization

Problem statement:

The entropy of a probability vector $ p = (p_1, … , p_n)^T $ is defined as $ H(p)= – \sum\limits_{i=1}^{n} p_i \log{p_i} $,
subject to $ \sum\limits_{i=1}^{n} p_i = 1 \mbox{ and } p_i \geq 0$, where $ 0\log{0} = 0 $ by convention.

What is the largest and smallest entropy of any probability vector?

Discussion

I can maximise the expression using Lagrange multipliers without a problem. It's also equally easy to prove that the smallest entropy of any probability vector is 0, by noting that it must be non-negative, and providing an example with entropy 0.

However, I would very much like to prove that the minimum value is 0 using Lagrange multipliers (the fact that I can't seems to me to suggest that there is something that I haven't understood).

I set up the Lagrangian: $ L(p, \lambda) = – \sum p_i\log{p_i} + \lambda(\sum p_i – 1) $ and by differentiating to find stationary points obtain the system: $ {\frac{\partial{L}}{\partial{p_i}}}= -\log{p_i} – 1 + \lambda = 0 $. The obvious solution to this gives me my maximum point, but I am expecting another solution for the minimum.

  • Am I misusing Lagrange multipliers?
  • Is there another solution to my system that I am missing (maybe involving the "convention" about $ 0\log{0} = 0$)?
  • Do I need to worry about slackness and the condition $p_i \geq 0$ to find the minimum?
  • Is finding this minimum beyond the scope of this method for some reason, and my lecturers are in fact expecting me to resort to the "easy" method described above?

Any hints or answers are welcome, thank you in advance.

Best Answer

I think the standard notation requires the restrictions to be $f_j(p)\leq0$ for a minimization, and then multipliers can be nonnegative, so the lagrangian is always lower than the cost function.

Minimize h(p)

S. To

Sum(p)=1, let $\lambda_0$ be the asoc. Mult.

$p_j \geq0$, let $\lambda_j$ be the asoc. Mult.

Now to find the optimum candidates:

$dL/dp_j=-\log p_j -1 +\lambda_0-\lambda_j=0$

and kkt:

$\sum p=1$

$\sum p_j \lambda j=0$

Note from this last eq that, as both $\lambda_j$ and $p_j$ are positive, sum equals zero means every element is zero, i.e., either $\lambda_j$ or $ p_j$ equals zero. This and 'the convention' let you solve the multipliers, but well skip to a faster road:

  1. Look for $i: \lambda_i=0$

This implies $p_i= \exp{\lambda_0-1}$, which does not depend on anything else, so it must be a constant.

Let $p_i= \exp{\lambda_0-1}\triangleq 1/k$, where k is the sumber of non-zero mass elements.

  1. The old problem now becomes:

Minimize $-\sum 1/k \log 1/k=\log k$

Subject to

$ k \in Z, k\geq 1$

And clearly the optimum is achieved for k=1, which is the least entropic (degenerate ) distribution.

Note: to be thorough, you should also consider the other, non-zero multipliers, then solve for $\lambda_0$, then the rest multipliers, which have to be equal... But this is enough for the question I think