[Math] Expected centered entropy of the binomial distribution

asymptoticsbinomial distributionentropyit.information-theorypr.probability

In short, the function I am interested in is the following:
$$I_n(p) = \sum_{k=0}^n \binom{n}{k} p^k (1-p)^{n-k} \left[h(p) – h\left(\tfrac{k}{n}\right)\right],$$
where $h(x) \triangleq -x \log x – (1-x) \log (1-x)$ is the standard binary entropy function. I am able to prove that for large $n$ this function attains a local maximum around $p \approx \frac{1.338}{n}$ at which $I_n(p) \approx \frac{0.84}{n}$ in case binary logarithms are used. By assuming that $p = \frac{\alpha}{n}$ for $\alpha = O(1)$ and using a Poisson approximation of the binomial distribution, the value $\alpha \approx 1.338$ is a (numerically found) maximizing value of
$$\max_{\alpha > 0} \left\{\sum_{z=1}^{\infty} \frac{\alpha^z}{z!} e^{-\alpha} (z \log z) – \alpha \log \alpha\right\}.$$
Note that the latter expression does not contain $n$ anymore; I am interested in large-$n$ asymptotics of $I_n(p)$ which leads to the above, $n$-independent expression.

What I would like to show is that this point $p = \frac{\alpha}{n}$ is actually not just a local, but a global maximum, i.e., there are no values $p = \omega(\frac{1}{n})$ with even higher asymptotic values of $I_n(p)$ for large $n$. So my question can be formulated as:

Can we prove that for large $n$, $\max\limits_{p \in [0,1]} I_n(p) \to I_n(\frac{\alpha}{n})$ with $\alpha \approx 1.338184$?

Numerical inspection of $I_n(p)$ for say $n = 100\,000$ shows that $I_n(p) \propto \frac{1}{n}$ is nice and smooth, has two global maxima at $\frac{\alpha}{n}$ and $1 – \frac{\alpha}{n}$ with value $\frac{0.84}{n}$ and a local minimum at $p = \frac{1}{2}$ with value $\frac{0.72}{n}$ (and two trivial minima at $0$ and $1$ with value $0$). The figure below sketches what $I_n(p)$ (scaled with a factor $n$) looks like for small $n$. As $n$ increases, the function might be best described as a flat line at height $0.72$ with two peaks close to $0$ and $1$ of height $0.84$.

enter image description here

I've tried various approaches using e.g. Taylor expansions of the entropy, but I am not able to prove that "order terms" are actually small for arbitrary $p$. Also, a bound given here using Jensen's inequality is not sharp enough; it's roughly a factor $2$ too loose to prove the above statement.

Any help would be appreciated!


Attempt 1

For large $n$, the dominating terms in the summation in $I_n$ come from values $k \approx np$. Expanding $h(\frac{k}{n})$ around $k = np$ leads to
$$h\left(\frac{k}{n}\right) = h(p) + \left(\frac{k}{n} – p\right) h'(p) + \frac{1}{2}\left(\frac{k}{n} – p\right)^2 h''(p) + \frac{1}{6}\left(\frac{k}{n} – p\right)^3 h^{(3)}(p) + \dots.$$
Substituting this back into the definition of $I_n(p)$, this leads to
$$I = \sum_{k=0}^n \binom{n}{k} p^k (1-p)^{n-k} \left[-\left(\tfrac{k}{n} – p\right) h'(p) – \tfrac{1}{2}\left(\tfrac{k}{n} – p\right)^2 h''(p) – \tfrac{1}{6}\left(\tfrac{k}{n} – p\right)^3 h^{(3)}(p) – \dots\right].$$
Now the term $\frac{k}{n}$ leads to a factor $\frac{np}{n} = p$ when pulled out of the summation, so the first term disappears. As the second term is simply the variance (scaled by a factor $1/n^2$), this term results in $\frac{1}{2n} p(1-p)h''(p)$. As $h'(p) = \log_2(\frac{1-p}{p})$ and $h''(p) = -1/[p(1-p)\ln 2]$, the second term contributes $1/(2n \ln 2) \approx \frac{0.72}{n}$ as expected. So we have:
$$I_n(p) = \frac{1}{2n \ln 2} + \sum_{k=0}^n \binom{n}{k} p^k (1-p)^{n-k} \left[- \tfrac{1}{6}\left(\tfrac{k}{n} – p\right)^3 h^{(3)}(p) – \dots\right].$$
At this point, what remains is to show that all remaining order terms are small, if $np(1-p) = \omega(1)$.


Attempt 2

Using the approach described here, consider the sum
$$S = \sum_{k=0}^n \binom{n}{k} p^k (1-p)^{n-k} \frac{k}{n} \log_2 \frac{k}{n}.$$
First, we can cancel the $\frac{k}{n}$ with factors in the binomial coefficient, and pull out a factor $p$ to obtain
$$S = p \sum_{k=1}^n \binom{n-1}{k-1} p^{k-1} (1-p)^{n-k} \log_2 \frac{k}{n} = p \sum_{k=0}^{n-1} \binom{n-1}{k} p^{k} (1-p)^{n-k-1} \log_2 \frac{k+1}{n}.$$
Applying Jensen's inequality, we obtain
$$S \leq p \log_2\left(\sum_{k=0}^{n-1} \binom{n-1}{k} p^{k} (1-p)^{n-k-1} \frac{k+1}{n}\right) = p \log_2 \frac{(n-1)p + 1}{n}.$$
Pulling out the term $\log_2 p$, we obtain
$$S \leq p \log_2 p + p \log_2\left(1 + \frac{1 – p}{np}\right).$$
Applying the same procedure to the other term in $I_n(p)$, this leads to
$$I_n(p) \leq h(p) + \sum_{k=0}^n \binom{n}{k} p^k (1-p)^{n-k} \left[\frac{k}{n} \log_2 \frac{k}{n} + \frac{n-k}{n} \log_2 \frac{n-k}{n}\right] \\
= h(p) + \left[p \log_2 p + p \log_2\left(1 + \frac{1 – p}{np}\right)\right] + \left[(1-p) \log_2 (1-p) + (1-p) \log_2\left(1 + \frac{p}{n(1-p)}\right)\right] \\
= p \log_2\left(1 + \frac{1 – p}{np}\right) + (1-p) \log_2\left(1 + \frac{p}{n(1-p)}\right)$$
Using $\ln(1 + x) \leq x$ for all $x$, this leads to
$$I_n(p) \leq \frac{1 – p + p}{n \ln 2}.$$
Unfortunately this is a factor $2$ off, as $I_n(p) \sim \frac{1}{2n \ln 2}$ for almost all $p$.

Best Answer

By symmetry I can assume $p\le\frac12$. I will use natural logs.

Put $f(x)=h(p)-h(x)$ and $b_k = \binom{n}{k} p^k(1-p)^{n-k}$. Define $k_0=\lceil pn/2\rceil$ and $k_1=n-k_0$.

The plan is: find polynomials $f_0(x),f_1(x)$ such that $f_0(x)\le f(x)\le f_1(x)$ for $k_0/n\le x\le k_1/n$. Then make bounds on the various parts of $$\sum_{k=0}^{k_0-1} b_k(f(k/n)-f_0(k/n)) + \sum_{k=0}^n b_k f_0(k/n) + \sum_{k=k_1+1}^n b_k (f(k/n)-f_0(k/n)) \le \sum_{k=0}^n b_k f(k/n) \le \sum_{k=0}^{k_0-1} b_k(f(k/n)-f_1(k/n)) + \sum_{k=0}^n b_k f_1(k/n) + \sum_{k=k_1+1}^n b_k (f(k/n)-f_1(k/n)) .$$ Since $f^{(iv)}(x) = 2/x^3+2/(1-x)^3$, we can use Taylor's theorem with remainder to get $$f_j(x) = (\ln p -\ln(1-p))(x-p) + \frac{(x-p)^2}{2p(1-p)} - \frac{(1-2p)(x-p)^3}{6p^2(1-p)^2} + \frac{j(8-8p+2p^2+p^3)(x-p)^4}{3p^3(2-p)^2}$$ for $j=0,1$. Maple now tells us $$\sum_{k=0}^n b_k f_j(k/n)=\frac{1}{2n}-\frac{(1-2p)^2}{6p(1-p)n^2} + \frac{Bj}{n^2},$$ where $$B = \frac{(1-p)^2(8-8p+2p^2+p^3)}{p(2-p)^2} + \frac{(1-p)(8-8p+2p^2+p^3)(1-6p+6p^2)}{3p^2(2-p)^2n}.$$ This much is $1/(2n)+O(1/(pn^2))$.

In the range $0\le k\le k_0-1$, $b_k$ is dominated by a geometric series with ratio $\frac12$ and $f(x)-f_j(x)=O(p\ln p)$. Using the Stirling approximation for $b_k$, we find that $$\sum_{k=0}^{k_0-1} b_k(f(k/n)-f_j(k/n))=o(1/n)$$ for $p\ge (2+\epsilon)\ln\ln n/n$.

In the range $k_1+1 \le k\le n$, $b_k\le 2^{-3n/4}$ (using the assumption $p\le \frac12$) and $f(x)-f_j(x)=O(p^{-3})$, so this part is negligible compared to the previous part.

In summary, $$\sum_{k=0}^n b_k f(k/n) = \frac{1+o(1)}{2n}$$ for $(2+\epsilon)\ln\ln n/n\le p\le 1-(2+\epsilon)\ln\ln n/n$, any $\epsilon\gt 0$.

For the case $p=a/n$ with $a=O(\ln\ln n)$, use $\binom{n}{k}=\frac{n^k}{k!}\left(1-\binom{k}{2}/n -O(k^3/n^2)\right)$ and simple bounds on the tail to find $$\sum_{k=0}^n b_k f(k/n) = \sum_{k=0}^{(\ln n)^2} \frac{a^k}{k!}\left(1-\frac{\binom{k}{2}}{n}-\frac{ak}{n}\right)f(k/n) + O((\ln n)^{O(1)}/n^2).$$

Related Question