Solved – Second order inclusion probabilities in With-Replacement Sampling

coverage-probabilitysampling

I'm reading the book "Model Assisted Survey Sampling" from Särndal et al.
In chapter 2, there's a section about Sampling with replacement. I'll put this into context:
We have $m$ independent draws, such that, in every draw, every one of the $N$ population elements has the same selection probability : $\frac{1}{N}$

Once drawn, an element is replaced into the population so that all $N$ elements participate in each draw. Obviusly, the probability that any given element is not drawn at all is given by: $(1 – \frac{1}{N})^m$

So, the first order inclusion probability is: $\pi_k = 1 – (1- \frac{1}{N})^m$

Now, my particular cuestion is why is the second order inclusion probability is:

$\pi_{kl} = 1 – 2(1- \frac{1}{N})^m + (1- \frac{2}{N})^m$

I really don't understand why. Does this suppose to mean that $2(1- \frac{1}{N})^m – (1- \frac{2}{N})^m$ is the probability that neither the observation $k$ nor $l$ are drawn in the $m$ draws?

Please, if someone has an intuitive explanation, i would be very thankfull.

Best Answer

2nd order inclusion probability is defined as the probability that both item i and item j (with j $ \ne i$) are in the sample.

By use of the inclusion-exclusion principle https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle ,

P(i and j both in the sample) = P(i in the sample) + P(j in the sample) - (1 - P(neither i nor j in the sample))

You already know that P(i in the sample) = P(j in the sample) = $1 - (1- \frac{1}{N})^m$

We have that P(neither i nor j in the sample) = $(1- \frac{2}{N})^m$

Putting it altogether results in P(i and j both in the sample) = $1 - 2(1- \frac{1}{N})^m + (1- \frac{2}{N})^m$