Difference between binomial and hypergeometric distributions

probabilityprobability distributions

I am having trouble when it comes to picking which distribution to use to solve a probability problem, let's take the following exercise as an example:

Suppose that out of $500$ lottery tickets sold, $200$ pay off at least
the cost of the ticket. Now suppose that you buy $5$ tickets. Find the
probability that you will win back at least the cost of $3$ tickets.

I used the hypergeometric distribution while solving it but the solution manual indicates a binomial distribution. The reason I chose the hypergeometric distribution is that because I don't think these trials are independent with fixed probability, so for example I have $1/200$ chance of picking the first ticket that win back its cost but $1/199$ for the second one, etc…
This reasoning got me question almost all probability problems with binomial distribution I did before and I am lost no. Any help would be appreciated.

Best Answer

Which model you use depends on whether the $500$ tickets represents the population of all tickets that were sold, or whether the problem is merely stating that out of every $500$ tickets sold, $200$ of these will pay off at least the cost of the ticket, and that the total number of tickets is very much larger than $500$. Now, I tend to favor the former interpretation, because it is rather odd to state the proportion of "non-losing" tickets as $200$ out of $500$, when $2$ out of $5$ would have sufficed.

Let's look at the hypergeometric model. We have a population of $N = 500$ tickets of which $n = 200$ are "non-losing." You choose $m = 5$ tickets from this population and are interested in the random number $X$ of non-losing tickets occurring among the five. Thus $$\Pr[X \ge 3] = \sum_{x=3}^5 \Pr[X = x] = \sum_{x=3}^5 \frac{\binom{n}{x}\binom{N-n}{m-x}}{\binom{N}{n}} = \sum_{x=3}^5 \frac{\binom{200}{x}\binom{300}{5-x}}{\binom{500}{5}} = \frac{1010589063}{3190558595} \approx 0.316744.$$

Now, consider the binomial model. We would have $n = 5$ tickets drawn with non-losing probability $p = 2/5 = 0.4$, hence $$\Pr[X \ge 3] = \sum_{x=3}^5 \binom{n}{x} p^x (1-p)^{n-x} = \sum_{n=3}^5 \binom{5}{x} (0.4)^x (0.6)^{5-x} = \frac{992}{3125} \approx 0.31744.$$

These are very similar, but the hypergeometric model has slightly smaller probability. The two models would diverge more substantially if $N$ were smaller or $m$ were larger.

As a more advanced exercise, consider the limiting case where $N \to \infty$ and $n = Np$ for some fixed $p \in (0,1)$. What happens to the hypergeometric PMF?

Related Question