[Math] Why this probability was calculated using Binomial Distribution

binomial distributionprobabilityprobability distributions

The following is an exercise in this book (Discrete-Event System Simulation – Fourth Edition).

Exercise 5.3

A recent survey indicated that 82% of single women aged 25 years old
will be married in their lifetime. Using the binomial distribution,
find the probability that two or three women in a sample of twenty
will never be married.

Solution

(From the book's solution manual)

Let X be defined as the number of women in the sample never married

P(2 ≤ X ≤ 3) = p(2) + p(3)

= $ \binom{20}{2} (.18)^2 (.82)^{18} + \binom{20}{3} (.18)^3 (.82)^{17} $

= .173 + .228 = .401

My Question

If I understand it correctly, the binomial distribution is a discrete probability distribution of a number of successes in a sequence of n independent yes/no experiments.

But choosing 2 (or 3) women from a 20-women sample is not independent experiments, because choosing the first woman will affect the probability for the coming experiments.

Why the binomial distribution was used here ?

Best Answer

I think the question is assuming that each individual woman has an 82% chance of getting married, independently of what other women will do.

We aren't choosing a 2- or 3-woman sample, we are merely checking the marital status of 20 women and checking if there happen to be 2 or 3 who are unmarried.


EDIT: Another way of looking at the problem:

Let's say we have a ball pit filled with 1 million balls. 820,000 are blue and 180,000 are red. Therefore, if I pick a ball at random, I have an 82% chance of it being blue and a 18% chance of it being red.

Now, what if draw a blue ball, throw that ball away, and decide I want to draw another one? It's true that the probability distribution has changed, since there are now 819,999 blue balls and 180,000 red balls, with 999,999 total balls. But for simplicity's sake, we can assume the probability distribution it hasn't changed very much (only by ~$10^{-6}$ in fact), so keeping our 82%/18% distribution is still going to be mostly accurate.

If I draw a small number of samples relative to the total number of balls (~20 samples relative to 1 million), the distribution is approximately binomial.

So on a mathematical level, you are correct: the distribution does change when you sample without replacement, but I think the problem wants you to make a simplifying assumption.

Related Question