Sampling without replacement hypergeoemtric distribution

probabilityreal-analysisstatistical-inferencestatistics

In the following example, why does the hypergeometric distribution alone represent sampling without replacement? Doesn't the distribution of each of the $X_i$ matter? I thought that instead, we sample according to the distribution of the $X_1$, then sample from the remaining variables according to the same distribution conditioned on the fact that we just chose this $X_1$. From this example, it seems like regardless of the distribution of the $X_i$, we will always model it according to the hyper-geomemtric distribution. Where am I going wrong in my reasoning?

Example 5.1.3 (Finite population model) As an example of an approximate calculation using independence, suppose $\{1, \ldots, 1000\}$ is the finite population, so $N=1000$. A sample of size $n=10$ is drawn without replacement. What is the probability that all ten sample values are greater than 200 ? If $X_{1}, \ldots, X_{10}$ were mutually independent we would have
$$
\begin{aligned}
P\left(X_{1}>200, \ldots, X_{10}>200\right) &=P\left(X_{1}>200\right) \cdots P\left(X_{10}>200\right) \\
&=\left(\frac{800}{1000}\right)^{10}=.107374
\end{aligned}
$$

To calculate this probability exactly, let $Y$ be a random variable that counts the number of items in the sample that are greater than 200 . Then $Y$ has a hypergeometric $(N=1000, M=800, K=10)$ distribution. So
$$
\begin{aligned}
P\left(X_{1}>200, \ldots, X_{10}>200\right) &=P(Y=10) \\
&=\frac{\left(\begin{array}{c}
800 \\
10
\end{array}\right)\left(\begin{array}{c}
200 \\
0
\end{array}\right)}{\left(\begin{array}{c}
1000 \\
10
\end{array}\right)} \\
&=.106164
\end{aligned}
$$

Thus, (5.1.4) is a reasonable approximation to the true value.

Best Answer

I think I understand your dilemma as I spent a while wondering about it myself. The situation I had was with binary results and hence I was naturally drawn to exploring the binomial distribution. As you would know, in the case of the binomial distribution, you not only consider the probability of getting a combination of true and false events but also the individual probability of said combination. With the hypergeometric though, I understand you feel like you "ignore" the individual probability and only consider the combinations.

In actuality, the individual probabilities are taken into account in the derivation of the hypergeometric formula. If the total number of items is N, with a starting portion of M true items such that the total number of initial false items is N-M. Let's say n is the number of items you choose and X is the desired number of true items in n trials.

I will show you how you can derive the formula from observations.

Say you do 3 trials, so n=3 and imagine M=15 and N=20. Say you want to find the probability that you get 1 true result; you could have the following combinations:

true, false, false : P(x=1) = $\frac {15}{20} * \frac {5}{19} * \frac {4}{18}$

or

false, true, false : P(x=1) = $\frac {5}{20} * \frac {15}{19} * \frac {4}{18}$

or

false, false, true : P(x=1) = $\frac {5}{20} * \frac {4}{19} * \frac {15}{18}$

In all 3 combinations; P(x=1) = $\frac {5*4*15}{20*19*18}$

Hence we know order does not matter. You will notice as well that there are 3 combinations which we know from n choose X being 3 choose 1 = 3. Because order does not matter, we can multiply the combinations of n choose X by the probability of X in a single combination (just like the binomial distribution). Now, all we need to do is find a formula to describe the probability of a single combination. From the example described above and through one's own experimentation, they can see that the denominator will be:

$ \frac {N!} {(N-n)!}$

Which is: ${}_N \mathrm{ P }_n$

(remember we can say this because the order of true and negative results does not affect the end probability).

Now we just need to find the product of how many true results there will be (in the example this is only 1 but we tell that if X=3, then the numerator would be 3 * 2 * 1). Because we are looking at joint probabilities, everything is multiplication and order of positives relative to trials does not matter so the numerator will be:

Decreasing values of remaininng true items* decreasing values of remaining false items = $ \frac {M!} {(M-X)!}$*$\frac {(N-M)!}{((N-M)-(n-X))!}$ = $({}_{N-M}\mathrm{ P }_{n-X})*({}_N \mathrm{ P }_n) $

Combining the numerator and denominator we get that the probability of X=x in a single combination is:

$\frac {({}_M\mathrm{ P }_X)({}_{N-M}\mathrm{ P }_{n-X})}{{}_N \mathrm{ P }_n }$

Hence;

P(X) = ${}_n\mathrm{ C }_X$$*\frac {({}_M\mathrm{ P }_X)({}_{N-M}\mathrm{ P }_{n-X})}{{}_N \mathrm{ P }_n }$

With some rearranging:

P(x) = $\frac { \frac {n!}{(n-X)!X!} * \frac {M!}{(M-X)!}* \frac {(N-M)!}{((N-M)-(n-X))!}} {\frac {N!} {(N-n)!}} $

= $\frac { \frac {M!}{(M-X)!X!}* \frac {(N-M)!}{((N-M)-(n-X))!(n-X)!}} {\frac {N!} {(N-n)!n!}} $

Now we can express this in terms of combinations:

= $\frac {{}_M\mathrm{ C }_X * {}_{N-M}\mathrm{ C }_{n-X}} {{}_N\mathrm{ C }_n} $

So to answer the general jist of your question, the ironic reason we do not have to consider the individual probability of the variable is that the order of trials does not matter in the derivation despite the fact that individual probabilities are dependent on previous events.

I hope this helped and answered your question.

Related Question