[Math] drawing different colored balls from one urn without replacement and at least

probability distributions

I have this problem (numbers in the example are much smaller than reality, so it would help to get a general equation):

One urn contains $10$ red, $10$ yellow, $10$ black, $10$ green, and $10$ orange balls (total of $50$).

Question 1: if I draw an $x$ number of balls (w/o replacement), what is the probability to get at least one ball of each color?

Question 2: if I draw an $x$ number of balls (w/o replacement), what is the probability to get at least one red, one yellow and one black ball?

Question 3: How many balls should I draw to expect to get at least four different colors (I am not sure whether this last question will require confidence intervals to be answered??)

Many thanks in advance

Best Answer

We answer the first question. The same idea deals with the second, but with less work.

We stick with $5$ colours, Say there are $a$, $b$, $c$, $d$, $e$ of the various colours, which for convenience we call $a,b,\dots$. Let $n=a+b+c+d+e$. There are $\binom{n}{x}$ ways to draw $x$ items, all equally likely.

We count the bad draws, the ones that are missing one or more of the colours. To get the desired probability, we subtract the number of bads from total, then divide by total.

The idea we will use is called the Principle of Inclusion/Exclusion. Some North American texts call it PIE.

There are $\binom{n-a}{x}$, $\binom{n-b}{x}$, and so on ways to choose $x$ items while leaving out items coloured $a$, $b$, and so on.

Add up these numbers. That does not give us a correct count of the bads, since we have counted twice the choices that miss, for example, colours $a$ and $b$.

So subtract the sum $\binom{n-a-b}{x}+\binom{n-a-c}{x}+\cdots$, $10$ terms in all.

We have subtracted too much, for we have subtracted twice the choices that, for example, miss colours $a$ and $b$ and $c$.

So add back $\binom{n-a-b-c}{x}$, plus $9$ other terms of that shape.

We have added back too much, for we have added back too many times choices like missing all of colours $a$, $b$, $c$, $d$. So subtract $\binom{n-a-b-c-d}{x}$, together with the other $4$ terms of that shape.

If $a=b=\cdots=e$, as in your example, the expression is quite a bit less messy.

We count the bads in that case. Then $n$ is, say $5k$. The first sum was $\binom{5}{1}\binom{4k}{x}$, we subtract $\binom{5}{2}\binom{3k}{x}$, add back $\binom{5}{3}\binom{2k}{x}$, and finally subtract $\binom{5}{4}\binom{k}{x}$.

Note that we define $\binom{p}{q}$ to be $0$ if $p\lt q$. That makes the expressions formally correct in all cases.

There is straightforward generalization beyond $5$ colours. Typically, the terms in the Inclusion/Exclusion become after a while negligible compared to the lead terms, so by truncating suitably we may be able to get a not too hard to compute adequate estimate.

Related Question