A more in-depth explanation of the capture/recapture method

combinatoricsstatistics

I am reading these lecture notes from MIT Introduction To Probability And Statistics.

I am trying to understand the capture/recapture method (see Example 6).

I will rewrite it here:

The capture/recapture method is a way to estimate the size of a population in the wild. The method assumes that each animal in the population is equally likely to be captured by a trap.

Suppose 10 animals are captured, tagged and released. A few months later, 20 animals are captured, examined, and released. 4 of these 20 are found to be tagged. Estimate the size of the wild population using the MLE for the probability that a wild animal is tagged.

answer: Our unknown parameter $n$ is the number of animals in the wild. Our data is that 4 out of 20 recaptured animals were tagged (and that there are 10 tagged animals). The likelihood function is
$$P(\text{data | $n$ animals}) = \frac{\binom{n-10}{16}\binom{10}{4}}{\binom{n}{20}}$$
(The numerator is the number of ways to choose 16 animals from among the n−10 untagged ones times the number of was to choose 4 out of the 10 tagged animals. The denominator is the number of ways to choose 20 animals from the entire population of n.) We can use R to compute that the likelihood function is maximized when n = 50. This should make some sense. It says our best estimate is that the fraction of all animals that are tagged is 10/50 which equals the fraction of recaptured animals which are tagged.

Can you explain how they get this formula
$$P(\text{data | $n$ animals}) = \frac{\binom{n-10}{16}\binom{10}{4}}{\binom{n}{20}}?$$

I think they are making use of the rule of the product i.e. if there $k$ ways to perform action 1 and then by $m$ ways to perform 2, then there are $k\cdot m$ ways to perform action 1 followed by action 2.

In the example above, action 1 is taking 16 animals from the population not tagged, and action 2 is taking 4 tagged animals from the population that is tagged.

My problem with this is that the order in which we perform these actions doesn't matter so aren't we over counting by a factor of 2?

Best Answer

No, there is no overcounting, the numerator and denominator both use combinations, so they are consistent.

Let me illustrate with a simple example of drawing without replacement, and find the probability that $2$ red marbles and $3$ blue marbles are drawn from a total of $4$ red and $8$ blue marbles.

The combination method will give $\Large\frac{\binom42\binom83}{\binom{12}5}=\frac{14}{33}$

But if you try direct multiplicative way, you get
$\Large\frac4{12}\frac3{11}\frac8{10}\frac79\frac68 = \frac7{165}$ and you will need to multiply by $\binom52$ to get the right answer, because you counted the numerator in one particular order, while you counted the denominator in all possible orders.

Related Question