What’s the probability that in a box of a dozen donuts with flavors randomly picked out of 14 no more than 3 flavors are in the box? (Check the work)

binomial-coefficientscombinatoricsprobabilityproof-verification

Suppose that in a donut shop that offers 14 flavors of donuts there's a "grab bag" box with random flavors thrown in, each flavor equally likely for each donut. What is the probability that the box contains no more than three flavors?

"Order" of the donuts doesn't matter, and sampling is done with replacement; the donut shop always has enough donuts for each flavor. So the number of "grab bag" boxes of donuts is ${14 + 12 – 1 \choose 12} = {25 \choose 12} =5200300$.

Now for finding how many ways for there to be no more than three flavors in the box. If we knew what the flavors were, the solution would be ${12 + 3 – 1 \choose 12} = {14 \choose 12} = 364$, but we don't know what the flavors are. My first approach was to try ${14 \choose 3}{14 \choose 12}$; that is, pick the three flavors, then pick a donut box when those are the three flavors. But I believe this leads to double-counting since we could have two sets with two flavors in common and only one of the two flavors in common is used in the box of donuts. Thus this is incorrect.

Here's my new approach. If there's exactly one flavor, then we pick it and fill the box with that flavor; there's 14 ways to pick one flavor. If there's exactly two flavors in the box, we'll call them Flavor 1 and Flavor 2. There is at least one donut of Flavor 1 and one of Flavor 2. Now pick the rest of the donuts' flavors, order doesn't matter, there is replacement; there are ${10 + 2 – 1 \choose 10} = {11 \choose 10} = 11$ ways to do that. Then pick the two flavors: there's ${14 \choose 2}$ ways to do that, and thus $11{14 \choose 2}$ boxes with exactly two flavors. Similarly, for exactly three flavors, there are ${14 \choose 3} {9 + 3 – 1 \choose 9} = {14 \choose 3} {11 \choose 9}$ ways for there to be exactly three flavors. Sum these numbers.

Does this appear to be correct? Have I double-counted again?

Best Answer

The first method indeed over counts. For example, consider the possibility of only one flavor in the box. The first method would tell you how to choose $3$ flavors from $14$ then see how to arrange these flavors in the box. You could have $12$ of flavor #1, $0$ of flavor #2, and $0$ of flavor #3. But this method also separately counts $12$ of flavor #1, $0$ of flavor #2, and $0$ of flavor #4, for example, which is the same as what we had before. So indeed the first method over counts. This looks to be what you were saying in the question body, but here it is in more detail.

When you physically break the problem into cases like you do in your second method, you remove this over counting that was going on in the first method because you're enforcing the rule that to be considered a part of the case with $3$ flavors in the box, there can't be a flavor represented with $0$ donuts in the box.

All in all, your second approach is the way to go for the combinatorics answer.

However, this isn't the way to find the probability, as you are now mistaking each kind of box as being equally likely to be possible. The probability of having a box of all flavor #1 donuts is not the same as having a box with $11$ flavor #1s and $1$ flavor #2. Think about coin flips: if you flip a coin twice, is the chances of two heads and one head one tails equally likely? To find the probability, take note that this follows a Multinomial Distribution and see if you can go from there.


The probability of this happening is extremely small, as shown by this R script. It generates various possible boxes and checks if the count of different flavors in the box is less than $3$.

total = 0
for (y in 1:10000000){
  x = rmultinom(1,12,c(1/14,1/14,1/14,1/14,1/14,
                       1/14,1/14,1/14,1/14,1/14,1/14,1/14,1/14,1/14))
  x <- c(x)

  count = 14
  for(i in x){
    if(i==0){
     count = count -1
   }   

  }
 if(count <= 3){total = total + 1}
}
sprintf("%.20f", total / 10000000)

One run of this code produced that a box only has at most three flavors $27$ times out of $10,000,000$ boxes.

Related Question