[Math] Probability of X red balls when drawing Y balls from A red and B green balls.

probability

Background

Ok, so I have a homework problem looking for probability distributions. So the first step is to calculate the probability of each possible result.

Specifically, I have 7 red balls and 4 green balls and am drawing 5 balls without replacement. Then I want to know the probability of getting 0, 1, 2, 3, 4, 5 red balls.

There are various answers to similar problems, but I'm having difficulty finding a question asking about the general case, and answers with any real explanation of what's going on. I want a better way than going through a bunch of tedious crap each time, so I tried to solve this myself.

Solution to the Specific Problem

In my specific case, I can brute force it.

I can't possibly get 0 red balls, since there are only 4 not-red balls, so $P(\text{0 red})=0$.

$P(\text{5 red})$ is just $\frac{7_r}{7_r+4_g}$ $\cdot\frac{6_r}{6_r+4_g}$ $\cdot\frac{5_r}{5_r+4_g}$ $\cdot\frac{4_r}{4_r+4_g}$ $\cdot\frac{3_r}{3_r+4_g}$ $=\frac{7_r}{11_t}$ $\cdot\frac{6_r}{10_t}$ $\cdot\frac{5_r}{9_t}$ $\cdot\frac{4_r}{8_t}$ $\cdot\frac{3_r}{7_t}$ $=\frac{1}{22}$.

Similar math can show the probability $P(RGGGG)$ $=P(GRGGG)$ $=P(GGRGG)$ $=P(GGGRG)$ $=P(GGGGR)$ $=\frac{7\cdot4\cdot3\cdot2\cdot1}{11\cdot10\cdot9\cdot8\cdot7}$. Since there are five ways to get one red, the total probability is $P(\text{1 red})$ $=\frac{1}{66}$.

The same concept can be used to get that $P(\text{4 red})$ $=P(\text{1 green})$ $=5\cdot P(GRRRR)$ $=5\cdot\frac{4\cdot7\cdot6\cdot5\cdot4}{11\cdot10\cdot9\cdot8\cdot7}$ $=\frac{10}{33}$.

It's a bit more annoying with 2 and 3 balls. Now, I have to calculate the odds of every combination of two reds and three greens. However, I think the math ends up being the same. The denominator is always $11\cdot10\cdot9\cdot8\cdot7$ or $\frac{11!}{6!}$ $=55440$. The numerator for the first red is 7, the second is 6, the third is 5. The numerator for green is 4, 3, 2.

So $P(RRGGG)$ $=P($any specific permutation of 2 red, 3 green$)$ $=\frac{7\cdot6\cdot4\cdot3\cdot2}{11!\div6!}$. Then, there are many permutations of 2R3G. 4 ways the first red is the first draw, 3 ways it's the second draw, 2 ways it's the third draw, and 1 way it's the fourth draw. $4+3+2+1$ $=10$ total permutations. So $P(\text{2 red})$ should be $10\cdot\frac{7\cdot6\cdot4\cdot3\cdot2}{11!\div6!}$ $=\frac{2}{11}$.

There should be $10$ permutations of 3R2G (same math, except imagine ways to get 2G instead of 2R), which means $P(\text{3 red})$ $=10\cdot\frac{7\cdot6\cdot5\cdot4\cdot3}{11!\div6!}$ $=\frac{5}{11}$.

Recap: $P(0\text{ red})$ $=0$, $P(1 \text{ red})$ $=\frac{1}{66}$, $P(2 \text{ red})$ $=\frac{2}{11}$, $P(3 \text{ red})$ $=\frac{5}{11}$, $P(4 \text{ red})$ $=\frac{10}{33}$, $P(5 \text{ red})$ $=\frac{1}{22}$. $P(\text{total})$ $=0$ $+\frac{1}{66}$ $+\frac{2}{11}$ $+\frac{5}{11}$ $+\frac{10}{33}$ $+\frac{1}{22}$ $=1$, which is good.

Converting from Specific to General

From this point, I can hopefully find a pattern of how to solve generic questions of this nature. Switching away from the nomenclature given in the title, I have
$T_R=$ total number of red balls $=7$,
$T_G=$ total number of green balls $=4$,
$T=$ total number of balls $=T_R+T_G$ $=11$,
$D_R=$ number of red balls drawn,
$D_G=$ number of green balls drawn, and
$D=$ total balls drawn $=D_R+D_G$ $=5$.

Some preliminaries. A) Since I only care about the number of red balls, the same solution should apply even if I had, for example, 7 red balls, 2 blue balls, 1 white ball and 1 green ball. So really, $D_G$ could be $D_B+D_W+D_G$ or whatever. Ultimately, it's just $D-D_R$. B) My results always end up in the form $P=C\cdot\frac{F_N}{F_D}$.

The first thing I notice is that my fraction's denominator is always the product $F_D=T\cdot(T-1)$ $\cdot\ldots$ $\cdot(T-D+1)$. This simplifies to $F_D=\frac{T!}{(T-D)!}$.

Next, I notice that my fraction's numerator is always the product $F_N=T_R\cdot(T_R-1)$ $\cdot\ldots$ $\cdot(T_R-D_R+1)$ $\cdot T_G$ $\cdot(T_G-1)$ $\cdot\ldots$ $\cdot(T_G-D_G+1)$. This simplifies to $F_N=\frac{T_R!T_G!}{(T_R-D_R)!(T_G-D_G)!}$. Noting that $T_G$ is just the count of not-red balls and $D_G$ is not-red balls drawn, we get $F_N=\frac{T_R!(T-T_R)!}{(T_R-D_R)!(T-T_R-(D-D_R))!}$ $=\frac{T_R!(T-T_R)!}{(T_R-D_R)!(T-T_R-D+D_R)!}$.

Finally, I note that I have a coefficient equal to the number of permutations of $D_R$ red balls and $D_G$ not-red balls. Looking through my book, I have an equation for "permutations of n things, divided into c classes of alike things differing from class to class", where $\text{permutations}$ $=\frac{n!}{n_1!n_2!\cdots n_c!}$. In my case, $n=D$, $n_1$ $=D_R$, and $n_2$ $=n_c$ $=D-D_R$. Which leads to $C=$ $\frac{D!}{D_R!(D-D_R)!}$.

Putting all of this together, I get:

$P=\frac{D!}{D_R!(D-D_R)!}\cdot\frac{\frac{T_R!(T-T_R)!}{(T_R-D_R)!(T-T_R-D+D_R)!}}{\frac{T!}{(T-D)!}}$ $=\frac{D!T_R!(T-T_R)!(T-D)!}{D_R!(D-D_R)!(T_R-D_R)!(T-T_R-D+D_R)!T!}$

I'm not sure if there's a way to simplify that further, but if everything is accurate, I can just plug in the four values in the question (well, noting that $D=D_R+D_G$) and get an answer.

Plugging this into WolframAlpha using the given constants, I get:
$P(0)=$ $0$
$P(1)=$ $\frac{1}{66}$
$P(2)=$ $\frac{2}{11}$
$P(3)=$ $\frac{5}{11}$
$P(4)=$ $\frac{10}{33}$
$P(5)=$ $\frac{1}{22}$
(To get it to work on WolframAlpha, I substituted $d=D$, $x=D_R$, $t=T$, $r=T_R$.)

Since these are the numbers I got earlier, this is promising.

Converting to "n choose k" Format

Looking through my book, there's a formula for combinations. Specifically, combinations of n different things, taken k at a time, without repetitions, is $\text{combinations}$ $=\binom{\color{magenta}{n}}{\color{green}{k}}$ $=\frac{\color{magenta}{n}!}{\color{green}{k}!(\color{magenta}{n}-\color{green}{k})!}$.

I notice the formula I came up with looks a lot like the factorials in the combinations formula. So I'm hoping there's a way to simplify my formula so it uses $\binom{n}{k}$ functions instead of explicit factorials. The first two lines are color-coded so you can see how I moved from one line to the next.

$P=\frac{\color{blue}{D!}T_R!\color{orange}{(T-T_R)!}\color{blue}{(T-D)!}}{D_R!\color{orange}{(D-D_R)!}(T_R-D_R)!\color{orange}{(T-T_R-D+D_R)!}\color{blue}{T!}}$

$P$ $=\color{blue}{\frac{D!(T-D)!}{T!}}$ $\color{orange}{\frac{(T-T_R)!}{(D-D_R)!((T-T_R)-(D-D_R))!}}$ $\frac{T_R!}{D_R!(T_R-D_R)!}$

Now, color code it according to the combination definition above.

$P$ $=\frac{\color{green}{D}!(\color{magenta}{T}-\color{green}{D})!}{\color{magenta}{T}!}$ $\frac{\color{magenta}{(T-T_R)}!}{\color{green}{(D-D_R)}!(\color{magenta}{(T-T_R)}-\color{green}{(D-D_R)})!}$ $\frac{\color{magenta}{T_R}!}{\color{green}{D_R}!(\color{magenta}{T_R}-\color{green}{D_R})!}$

Seems pretty straightforward to convert this to $\binom{n}{k}$.

$P$ $=\frac{\frac{\color{magenta}{(T-T_R)}!}{\color{green}{(D-D_R)}!(\color{magenta}{(T-T_R)}-\color{green}{(D-D_R)})!} \frac{\color{magenta}{T_R}!}{\color{green}{D_R}!(\color{magenta}{T_R}-\color{green}{D_R})!}}{\frac{\color{green}{D}!(\color{magenta}{T}-\color{green}{D})!}{\color{magenta}{T}!}}$

$P$ $=\frac{\color{orange}{\binom{T-T_R}{D-D_R}}\binom{T_R}{D_R}}{\color{blue}{\binom{T}{D}}}$

Checking with WolframAlpha:

$P(0)=$ $0$
$P(1)=$ $\frac{1}{66}$
$P(2)=$ $\frac{2}{11}$
$P(3)=$ $\frac{5}{11}$
$P(4)=$ $\frac{10}{33}$
$P(5)=$ $\frac{1}{22}$

Again, this matches what I got by hand.

Actual Questions

Is my math correct? Is the final answer something I can actually use for other problems of this type, or did I just happen to get a good result because I tested it against the same problem I formulated the answer with? Are there limiting factors here? Is there an easier way to get the same result?

Best Answer

Yeap.   That's okay.

You took the long, winding road to get to your destination, but enjoyed the scenery, and learned something along the way.

It's all good.


The easier way to get to the result is to just use the combinatorial definition of the binomial coefficient and go straight to goal.   $\binom n r$ counts the ways to select $r$ elements from $n$.

Out of all the (equally-probable) ways to select $D$ from $T$ balls, we wish to know the probability for selecting $D_R$ from $T_R$ red balls and $D-D_R$ from $T-T_R$ remaining.

$$\mathsf P(X=D) ~=~ \dfrac{\dbinom{T_R}{D_R}\dbinom{T-T_R}{D-D_R}}{\dbinom{T}{D}}$$

Then you just have to note what values of $D$ give non-zero probability (the support of the probability mass function).

That is all.


Probability of X red balls when drawing Y balls from A red and B green balls.

$$\mathsf P(X=x) ~=~ \dfrac{\dbinom A x~\dbinom B{Y-x}}{\dbinom {A+B}Y} \quad\mathbf 1_{x\in\bigl[\max\{0, B-Y\};\min\{A,Y\}\bigr]\cap\Bbb N}$$