Equivalence Between Uniform and Normal Distribution

entropynormal distributionuniform distribution

The principle of insufficient reason says that all outcomes are equiprobable when we have no knowledge to guess otherwise. I understand this and that this corresponds to uniform distribution. However, different sources say that this is only true for discrete case. For continuous distributions, normal distribution corresponds to maximum entropy. Here is WP:

the maximum entropy prior on a discrete space, given only that the probability is normalized to 1, is the prior that assigns equal probability to each state. And in the continuous case, the maximum entropy prior given that the density is normalized with mean zero and variance unity is the standard normal distribution.

I cannot understand why line starts bending when we divide it into continuum of outcomes. What is the expectation (mean, peak) and variance of such normal distribution? How can constant converge to a curve, divided into more intervals? The normal distribution is different from uniform in that latter has a peak and, thus, some outcomes are more probable. How that be based on the equiprobability principle? Where the variance comes from in the continuous case? I read the article http://www.math.uconn.edu/~kconrad/blurbs/analysis/entropypost.pdf on how that it is derived, but could not grasp it. Can you explain qualitatively?

Furthermore, I see that they prove Lemma 4.2

$$\sum {p \log p \leq p \log q}$$

However, I do not understand how they prove Theorems 3.1-3.3 using it in Chapter 4. They just choose q distribution to be uniform, $q_i = 1/n$ in discrete case and normal in continuous case and using the lemma prove that entropy of q, $h(q) = \sum {p_i \log q_i}$ is greater or equal to any distribution p with entropy $h(p) = \sum {p_i \log p_i}$. This indeed follows from the Lemma 4.2. However, I do not understand two things:

  1. Why they define entropy of distribution p as $h(p) = p \log p$ but treat two entries of p as independent variables in computation of h(q). How can they replace only one entry with q and say that this is entropy of q? IMo, entropy of q is $\sum q \log q$ and it is $\neq \sum p \log q$
  2. What it has to do with the uniform (normal) distribution? I can take q to be any other distribution? Lemma 4.2 will prove that it is greater than the entropy of p for sure!

Best Answer

This is only an answer to your first question.

How can they replace only one entry with q and say that this is entropy of q?

In the paper $h(q)$ is not computed this way. The inequality of Lemma 4.2 is used to prove that $h(p) \le log(n)$ and $h(p) \lt log(n)$ if $p$ is not the uniform distribution with $p_1=p_2=\ldots p_n=\frac{1}{n}$

Lemma 4.2: $$-\sum_{i=1}^{n}p_i \log{p_i} \le -\sum_{i=1}^{n}p_i \log{q_i} \tag{1} $$ Equality holds iff $$p_i=q_i, i=1,\ldots , n \tag{2}$$

$\square$

We know that the entropy is defined by $$h(p)=-\sum_{i=1}^{n}p_i \log{p_i} \tag{3} $$ This can be used to reformulate the inequation of the Lemma as

$$ h(p)\le -\sum_{i=1}^{n}p_i \log{q_i} \tag{4} $$

This is valid for all discrete distributions so also for the uniform distribution with $$q_i=\frac{1}{n} ,i=1,\ldots,n \tag{4a} $$ Substituting $\frac{1}{n}$ for $q_i$ gives

$$ h(p)\le \sum_{i=1}^{n}p_i \log{n} = (\log{n}) \cdot \sum_{i=1}^{n}p_i = \log{n} \tag{5} $$

But $log{(n)}$ is also $h(q)$, if $q$ is the uniform distribution. This can checked simply by using the definition of the entropy:

$$h(q)=-\sum_{i=1}^{n}q_i \log{q_i}=-\sum_{i=1}^{n}\frac{1}{n} \log{\frac{1}{n}} = \log{n} \sum_{i=1}^{n}\frac{1}{n} = \log{n} \tag{6} $$ So it follows that for the uniform distribution $q$ $$h(p) \le \log{n} = h(q) \tag{7} $$

Because of $(6)$ and $(2)$ equality holds exactly if $p$ is the uniform distribution too.

Edit:

Theorem 5.1 states, that the continous probability density on [a,b] with $\mu = \frac{a+b}{2}$ that maximizes entropy is uniform distribution $q(x)=\frac{1}{b-a}, x \in [a,b]$. This complies with the principle of indifference for coninous variable found here.

On the whole real line there is no uniform probability density. On the whole real line there is also no continous probability density with highest entropy, because there are continous probability densities with arbitrary high entropies, e.g. the gaussian distribution has entropy $\frac{1}{2}(1+\log(2 \pi \sigma^2))$: if we increase $\sigma$ the entropy increases.

Because there is no maximal entropy for continuous densities over $R$ we must have additional constraints, e.g. the constraint that $\sigma$ is fixed and that $\mu$ is fixed. The fact that there is a given finite $\sigma^2$ and $\mu$ for me makes intuitively clear that there values nearer to $\mu$ must have higher probability. If you don't fix $\mu$ then you will get no unique solution.The Gaussian distribution for each real $\mu$ is a solution: this is some kind of "uniformness", all $\mu$ can be used for a solution.

Notice that it is crucial to fix $\sigma$, $\mu$ and to demand $p(x)>0 , \forall x \in R$. If you fix other values or change the form $R$ to another domain for the density funtion , e.g. $R^+$, you will get other solution: the exponential distribution, the truncated exponential distribution, the laplace distribution, the lognorma distribution (Theorems 3.3, 5.1, 5.2, 5.3)