Harvard Stat 110 Strategic Practice 2, Fall 2011 – Inclusion Exclusion – Problem 1.1

combinatoricsinclusion-exclusionprobability

Harvard Stat 110 Strategic Practice 2, Fall 2011 – Inclusion Exclusion – Problem 1.1

For a group of $7$ people, find the probability that all $4$ seasons (winter, spring, summer, fall) occur at least once each among their birthdays, assuming that all seasons are equally likely.

I tried to solve it using another method (which came into my mind at that moment) but I must be doing a mistake. Maybe someone could help me finding the error and also explaining WHY I make the error, so I can avoid it in the future.

I tried to apply the naive definition. So, I have $7$ people and $4$ seasons to choose.

1st person can have $4$ picks, the 2nd $4$ picks, etc etc. All are independent, so I have a total of $4^7$ possibilities.

The number of favourable outcomes are when from the $7$ people, $4$ have each Spring, Summer, Fall, Winter and the other $3$ might get any choice.

From $7$ people, I chose $4$ to fill Spring, Summer, Fall, Winter and the other $3$ can have whatever choice.

So, result should be
$$\binom{7}{4} \cdot \frac{4^3}{4^7}$$
which yields $0.546$ which is clearly different from the practice answer of $0.513$.

Could somebody, please, point out what I am doing wrong?

Thank you!

Best Answer

The error in your reasoning is that you are counting desired outcomes multiple times. To see why, suppose we represent the seasons as $W, S, U, F$ for winter, spring, summer, and fall, and the seven people are ordered so that their birthdays are denoted as a $7$-tuple. Then one desired outcome could be $$(W, S, U, F, S, F, W).$$ But there are at least two distinct ways you could count this: first, by selecting the first four people to have distinct seasons $(W, S, U, F)$, but we could also select the third, fifth, sixth, and seventh people $(U, S, F, W)$ as the ones to have distinct seasons.

To count correctly, what you need to do instead is use the recommended inclusion-exclusion principle. First, count all the outcomes in which there are strictly fewer than four seasons represented: $$\binom{4}{3} 3^7$$ where there are $\binom{4}{3}$ ways to select three distinct seasons out of the four, and $3^7$ represents the number of ordered outcomes using those three seasons. However, this also multiple-counts outcomes in the same way your original approach does, but here we are double-counting outcomes where at most two seasons are represented. So we subtract out $$\binom{4}{2} 2^7$$ of these, but now we have to add back in those outcomes where exactly one season is represented, of which there are $$\binom{4}{1} 1^7.$$ So the total number of desired outcomes is $$4^7 - \binom{4}{3} 3^7 + \binom{4}{2} 2^7 - \binom{4}{1} 1^7 = 8400$$ and the resultant probability is $$\frac{8400}{4^7} = \frac{525}{1024} \approx 0.512695.$$

To verify this, we can run Mathematica with the following inputs:

Select[Tuples[Range[4], 7], Length[Union[#]] == 4 &]

which outputs $8400$, and for simulation, we can input

Length@Select[ParallelTable[RandomInteger[{1, 4}, 7], 10^6], Length[Union[#]] == 4 &]/10^6

which outputs for me $0.513070$ for $10^6$ simulations.