Upper bound on sum of squares of binomial probabilities

binomial distributionbinomial-coefficientshypergeometric function

I have been trying to establish a meaningful upper bound on the sum of squares of probabilities from the binomial distribution, $\displaystyle\sum_{k=0}^n {n\choose k}^2p^{2k}(1-p)^{2(n-k)}$. This is bounded by 1 can be seen easily. From this post, Sum of squares of Binom(n,p) values, I found that this can be written in terms of hypergeometric function as,

$$\sum_{k=0}^n {n\choose k}^2p^{2k}(1-p)^{2(n-k)}=(1-p)^n {}_2F_1\left(\left.{-n,-n\atop 1}\right|\frac{p^2}{(1-p)^2}\right).$$

I did not know about hypergeometric functions, and this form is probably not useful for me. I am trying to bound it by some order of $n$, say $O(1/\sqrt{n})$, as studied in the post mentioned. Thanks for your help.

Best Answer

The asymptotic result I got was $S_n(p,q)\approx ( 4 n \pi p q)^{-1/2}$.

Derivation

Let $f(x)= (q+p e^{ix})$. It has the property that $(f(x))^n= \sum_k c_k e^{i kx}$ where $c_k=C(n,k) q^{n-k} p^k$.

By using the orthogonality relations for these fourier components $e^{i kx}$ one deduces the Parseval formula

$\frac{1}{2\pi} \int_{-\pi}^{\pi} f^n(x) \overline {f^n(x)} dx= \sum_k c_k^2=S_n$. This is the sum of squared binomials you wish to evaluate.

But some trig identities allow us to simplify $f(x)\overline {f(x)}= (1 - 4pq \sin^2(x/2))$.

Thus we seek an asymptotic estimate for the behavior of the exact expression

$$S_n= \frac{1}{2\pi} \int_{-\pi}^{\pi} ( 1- 4p q \sin^2(x/2))^N dx$$

The nonnegative expression $( 1- 4p q \sin^2(x/2))$ has a unique maximum located at $ x=0$ where it peaks at height $1$. Its minima are at the endpoints of integration, where it takes the value $1-4pq<1$. Note it is roughly bell-shaped. Raising it to high powers preserves these features. The maximum height remains 1, the minimum height drops lower as we increase the powers.

The standard trick is to make a change of variables near the origin so that $ 1- 4p q \sin^2(x/2) =e^{-u^2} $. (Then one can treat this as essentially a Gaussian integral.)

The change of variables is $- u^2 = \ln ( 1- 4p q \sin^2(x/2)) \approx -4p q \sin^2(x/2)$ so $u\approx 2\sqrt{pq} \sin (x/2) \approx \sqrt{pq} x$. That allows us to write $ du \approx \sqrt{pq} dx$

Thus $(1- 4 pq \sin^2(x/2))^n dx =e^{-n u^2} dx \approx \frac{1}{\sqrt{pq}} e^{-n u^2} du$.

Integrating in $u$ we get the result

$$ S_n \approx \frac{1}{2\pi} \frac{1}{\sqrt{pq}} \int e^{- nu^2} du $$ The interval of integration is approximately $|u|< \sqrt{pq}$. The change if variable $ u= v/\sqrt{n}$ converts the last integral to $\int e^{- n u^2} du =\frac{1}{\sqrt{n}}\int _{|v|\leq \sqrt{npq}|} e^{-v^2} dv =\frac{1}{\sqrt{n}} I_n$

where $I_n\to \int_{-\infty}^{\infty} e^{- v^2} dv = {\sqrt{\pi}}$.

Concluding, $S_n\approx \frac{1}{2 \pi} \frac{\sqrt{pi}}{\sqrt{ npq}} =\frac{1}{\sqrt{4 \pi n pq}}$

P.S. Should you desire more accuracy you can expand the change of variables to higher order as $u= \sqrt{p q } x+\frac{x^3 (6 p q-1) \sqrt{p q }}{24 }+O\left(x^5\right)$.

P.P.S. When $n=16$ and $(p,q)=(.8,.2)$ the approximation was accurate t0 3 decimal places: .1776 exact vs. .1773 approx