[Math] Asymptotic probability that two binomial variables are equal

probability

$X_1,\ldots,X_k$ are independent random variables distributed like $\text{Binomial}[n,p]$. What is the probability that they are all equal, as a function of $k$ $p$ and $n$, when $n$ is very large?

Currently I have two solutions.

Solution A is general but quite informal. When $n$ is large, a Binomial random variable behaves like a normal random variable, and most of its probability mass is concentrated in the interval $\mu \pm \sigma$, where $\mu$ is the mean value and $\sigma$ is the standard deviation. So we can approximate the variables as being drawn independently at random from a set with $2\sigma$ elements. After the first variable is determined, each of the remaining $k-1$ variables has probability $\approx 1/(2\sigma)$ to have the same value, so the probability that all are equal is $\approx 1/(2\sigma)^{k-1}$. Here $\sigma=\sqrt{p (1-p) n}$, so the probability is:

$${1 \over (4 p (1-p) n)^{(k-1)/2}} $$

Solution B is more formal but works only for $p=1/2$ and $k=2$ variables. Define $Y_2 := n – X_2$. So the event $X_1=X_2$ is identical to the event $X_1+Y_2 = n$. $Y_2$ is distributed like $\text{Binomial}[n,1-p]$, but here $p=1-p$ so it is distributed like $X_1$. Therefore $X_1+Y_2$ is distributed like $\text{Binomial}[2n,1/2]$ and the probability that it equals $n$ is just: ${2n\choose n}2^{-2n}$. By Stirling's approximation, the binomial coefficient is $\approx {4^n \over \sqrt{\pi n}}$, so the probability that $X_1=X_2$ is approximately:

$$ {1\over \sqrt{\pi n}}$$

This is a constant times the outcome of Solution A, which means that both solutions are at least asymptotically correct.

What is a solution that works for every $p$ and $k$?

EDIT: See also: What is the probability that two univariate Gaussian random variables are equal?

Best Answer

The Binomial $B_{n,p}(i)$, for large $n$, tends to a normal $N_{\mu,\sigma^2}(x)$ with $\mu=np$ and $\sigma^2 =n p q$ ($q=1-p$). Hence the desired probability $$P_{n,p,k}=\sum_{i=0}^n \left( {n \choose i}\, p^i (1-p)^{n-i} \right)^k$$ can be approximated, one hopes, by the integral

$$ P_{n,p,k} \approx I_{n,p,k}=\int (2 \pi \sigma^2)^{-k/2} \exp\left(- \frac{k(x-\mu)^2}{2 \sigma^2}\right) dx=\\ =(2 \pi \sigma^2)^{-k/2} \sqrt{\frac{2 \pi \sigma^2}{k}}=\frac{1}{\sqrt{k \, (2 \pi n p q)^{k-1}}}$$