Is the proof that a sequence of binomial variables of fixed mean don’t converge in probability to the Poisson distribution correct

probabilityprobability distributionsprobability theorysolution-verification

$\newcommand{\pmf}{\operatorname{pmf}}\newcommand{\d}{\,\mathrm{d}}\newcommand{\pr}{\operatorname{Pr}}$It is not hard to see that the probability of the sequence of binomially distributed variables with mean $\lambda$, $B_n\left(\frac{\lambda}{n},n\right)$, converges to the probability mass of the Poisson distribution.

However, I notice that Wikipedia phrases convergence of random variables in the following three ways:

$X_n$ converges to $X$ in distribution if, at all points at which the cumulative distribution $F$ is continuous: $$\lim_{n\to\infty}F_n(x)=F(x)$$

$X_n$ converges to $X$ in probability if, $\forall\epsilon\gt0$: $$\pr(|X-\lim_{n\to\infty}X_n|\gt\epsilon)=0$$

$X_n$ converges to $X$ almost surely if: $$\pr(\lim_{n\to\infty}X_n=X)=1$$

But these limits seem non-trivial to compute in practice, so I decided I'd try to see in what way the binomial distribution with fixed mean converges to the Poisson.

Let $\lambda\in\Bbb R^+$:

$$\forall n\gt\lambda,\,B_n\sim\mathcal{B}\left(\frac{\lambda}{n},n\right)$$

And let $\Lambda$ take the Poisson distribution with mean $\lambda$.

I will compute the p.m.f of the variable $\Lambda-X_n$ using the convolution formula, for $m\in\Bbb N_0$:

$$\begin{align}\pmf_{\Lambda-B_n}(m)&=\sum_{k\in\Bbb Z}\pmf_{\Lambda}(k+m)\pmf_{B_n}(k)\\&=\sum_{k=0}^{n-m}e^{-\lambda}\frac{\lambda^{k+m}}{(k+m)!}\binom{n}{k}\left(\frac{\lambda}{n}\right)^k\left(1-\frac{\lambda}{n}\right)^{n-k}\\&=n!\cdot\lambda^me^{-\lambda}\left(1-\frac{\lambda}{n}\right)^n\cdot\sum_{k=0}^{n-m}\frac{\lambda^{2k}}{n^k}\cdot\frac{1}{k!(k+m)!(n-k)!}\left(1-\frac{\lambda}{n}\right)^{-k}\end{align}$$

And for $m\in\Bbb Z$ generally:

$$\pmf_{\Lambda-B_n}(m)=\begin{cases}n!\cdot\lambda^me^{-\lambda}\left(1-\frac{\lambda}{n}\right)^n\cdot\sum_{k=0}^{n-m}\frac{\lambda^{2k}}{n^k}\cdot\frac{1}{k!(k+m)!(n-k)!}\left(1-\frac{\lambda}{n}\right)^{-k}&m\ge0\\n!\cdot\lambda^me^{-\lambda}\left(1-\frac{\lambda}{n}\right)^n\cdot\sum_{k=|m|}^{n+|m|}\frac{\lambda^{2k}}{n^k}\cdot\frac{1}{k!(k+m)!(n-k)!}\left(1-\frac{\lambda}{n}\right)^{-k}&m\lt0\end{cases}$$

Numerical computations confirm this indeed sums to $1$ over the domain $\Bbb Z$.

It then follows:

$$\begin{align}\pr(\lim_{n\to\infty}B_n=\Lambda)&=\pr(\Lambda-\lim_{n\to\infty}B_n=0)\\&=\lim_{n\to\infty}\pmf_{\Lambda-B_n}(0)\\&=\lim_{n\to\infty}\lambda\cdot e^{-\lambda}\left(1-\frac{\lambda}{n}\right)^n\cdot\sum_{k=0}^n\binom{n}{k}\left(\frac{\lambda^k}{k!}\right)^2\left(1-\frac{\lambda}{n}\right)^{-k}\\&\gt\lambda e^{-2\lambda}\\&\gt0\end{align}$$

It is clear from this that the variables $B_n$ do not converge to $\Lambda$ almost surely, or even in probability, and $B_n\to\Lambda$ in distribution only. Is this correct?

Moreover, what are the implications of this? I don't study probability so I don't have an intuition for how "bad" this is; before investigating this, I always thought the Poisson distribution was the "limit" of the binomial distribution, but now I'm not so sure, since the variables are quite different – numerical experiments have yielded that (for small $\lambda$) $\pr(\lim_{n\to\infty}B_n=\Lambda)\approx0.095$, which is remarkably high.

Best Answer

You are right that convergence in probability is a stronger condition that does not hold here. Here's some intuition. If you simulate $1000$ draws from $B_n$ and $\Lambda$ and plot the histogram, you will see that they are quite similar. This is because $B_n$ converges to $\Lambda$ in distribution. In practice, this is a useful fact because Poisson distributions can be easier to work with.

On the other hand, convergence in probability requires that a realization of $B_n$ and $\Lambda$ are close to one another with high probability. There is considerable variability in both $B_n$ and $\Lambda$, and so there is no reason to expect every draw should yield identical values. This is what you saw when you calculated the PMF of $\Lambda - B_n$.

Here is another example that highlights the difference. Let $X_n$ and $X$ be i.i.d. standard normal distributions. Then $X_n$ converges in distribution to $X$ because they all have the same CDF. However, we can make $P(|X - X_n| > \varepsilon)$ arbitrarily large by choosing $\varepsilon$ small, so $X_n$ does not converge in probability to $X$.