What’s the probability in a group of 8 people at least 2 people rolling the same number

birthdayprobabilitystatistics

In a group of 8 people what are the chances of at least 2 people getting the same number if they're rolling from 1 to 99 on a uniform distribution?

I've tried working this out by finding out the probability of no one in the group getting the same number then taking that from 1.

My initial calculation was thus $$1 – (98 \times 97 \times 96 \times 95 \times 94\times 93\times 92)/99^7 = 0.252 (3sf)$$ which seems way too high.

The other way of solving this is to find the probability of exactly 1<x people rolling the same number then adding them together, in which I got 0.00821 (3 sf) which is way too small since the probability must be greater than 1/99.

The calculation being
$\sum_{i=1}^7 \frac{98!}{99^7 \times (91+i)!}$
Can someone point out where I've gone wrong?

Best Answer

First of all, I upvoted your query. I regard your query in a very positive light, because you are (in effect) stretching your intuition. I regard intuition as indispensable in problems involving probability &/or combinatorics.

In your initial calculation, I am unfamiliar with your RHS terminology, but your LHS expression is accurate.

Critiquing your alternate approach is challenging, because I have to speculatively reverse-engineer your #'s. I was not able to do this. That is, I could not come to terms with your thinking.

However, although I do not regard this alternative approach as to be preferred in this problem, I do think that it is still highly educational to explore it. So...

The first thing to consider is how you are going to express your sample space, which normally becomes the denominator (D) of a fraction. Then when you compute the numerator (N), in a manner consistent with how you computed the denominator, the answer becomes $(N/D).$

The easy way is to set $D = (99)^8,$ which requires than when computing $N$, you construe order as important. That is, assuming that the people get a number in ascending order (i.e. Person-1 before Person-2), then [Person-1 = 5, Person-2 = 20] must be counted separately from [Person-1 = 20, Person-2 = 5].

The alternative method of setting $D$ is to accept the complexity that order must not be deemed important. I am going to reject this approach out of hand, because setting $D = (99)^8$ is simple and foolproof, as long as you are extremely careful when computing $N$.

When considering the different ways that a match occurred (i.e. computing $N$), one of the challenges is to avoid over-counting. There are two ways to approach this challenge.

(1) Inclusion-Exclusion : Given sets A, B, and letting |X| denote the number of elements in set X, you have
$|A \cup B| = |A| + |B| - |A \cap B|.$

(2) Partition the cases of matched numbers into mutually exclusive groups whose union represents all pertinent cases. The mutual exclusivity is critical for avoiding over counting.

For this problem, I would rather eat ground glass than try (1) above, so against the backdrop of $D = (99)^8,$ I am going for (2).

First of all, totally ignoring what the matching numbers are, in how many different ways can at least two of the people have matching numbers?

I would identify the various ways into cases (i.e. each case is to be a mutually exclusive group), be sure that I haven't omitted any cases, be sure that each case is mutually-exclusive from all of the others, and then count the # of elements in each case (i.e. each group). Note that the counting method used must be consistent with setting $D = (99)^8.$

Group (1):
All 8 people had the same #.
The # of ways is 99, since once the # to be matched is to be identified, there is only one way that it can occur.

Group (2):
7 people had the same #, and there was one odd man out.
There are 8 ways of choosing the odd man out.
Here, it would be a serious mistake to assume that there are
$\binom{99}{2}$ ways of choosing the pertinent numbers,
because having 7 people match #50, and 1 person getting #49 is different
from having 7 people match #49, and 1 person getting #50.

Therefore you have to
compute the # ways of choosing the numbers to be selected as
$99 \times 98$.
This makes the # of elements in this group:
$8 \times 99 \times 98.$

If pursued to conclusion, this answer would be very long. One of the challenges would be to precisely identify each of the groups.

If you represent the first two groups as (for example)
Group 1 = (8), Group 2 = (7,1),
Then (in my opinion) the next groups would be represented by
Group 3 = (6,2), and Group 4 = (6,1,1).

Instead, as one last examination, I am going to explore the group
(4,2,2), because I think that it has educational value.
There are $\binom{8}{4} \times \binom{4}{2} = \frac{8!}{4! \times 2! \times 2!} $
ways that the 8 people can be appropriately grouped into (4,2,2).

At this point it is essential to ask yourself some hard questions. For example, in the above computation method, is the grouping of (for example) the people into (1,2,3,4), (5,6), (7,8) over counted.
I say yes it has been over counted.

The easiest way to see this, is that once people (1,2,3,4) are grouped together
person-5 may go with either person-6, person-7, or person-8. Therefore, the correct enumeration is $\binom{8}{4} \times 3.$

Keeping in mind the backdrop of $D = (99)^8,$ it is clear that there are 99 choices for the first #. Then, the question becomes are the further choices to be computed as $98 \times 97$, or
$\frac{98 \times 97}{2}.$

Keeping in mind that the grouping of (1,2,3,4), (5,6), (7,8)
has not been over counted, and keeping in mind that you are looking for a counting method consistent with $D = (99)^8$,
I regard $(98 \times 97)$ as correct.

The easiest way to see this is to pretend that #99 was assigned to
the group (1,2,3,4).
Then, regardless of who Person-5 is partnered with
there are 98 choices for the number that is to be assigned
to Person-5 and his partner. Then, there are 97 choices for the number that
is to be assigned to the final 2 people.

Thus, for the group (4,2,2), the computation should be
$\binom{8}{4} \times 3 \times 99 \times 98 \times 97.$

I realize that I have (in effect) dodged your actual question. I did this only because, as I say, I could not reverse engineer your thinking. Hopefully, this answer will show you the complexities involved.

Addendum

Response to Lunatic Flowers' comment of 2020-09-29, at about 10:30am PST

First of all, good thinking. Your comment indicates that you attach importance to the idea that the cases you identify must be mutually exclusive. This leaves three challenges:

(1)
In addition to the cases being mutually exclusive, they must also be comprehensive. That means that any eventuality that could occur (such as 5 people getting one number and 3 people getting a different number) must also be assigned a separate case.

(2)
You must stick with a single idea for the sample space (i.e. denominator $= D$). In this case, I somewhat arbitrarily chose $D = (99)^8$, simply because it is the simplest way of defining the sample space.

(3)
When enumerating the # of ways that a specific case can occur, you must be extremely careful. That is, once you chose a strategy for defining your sample space, then for each case that you enumerate, you must ensure that its enumeration is perfectly consistent with your strategy, re defining the sample space.


Assume that you accept that $D = (99)^8.$

Assume that you are focusing on the specific case where exactly $x$ people got the same #, and the other $(8-x)$ people each got a different number, so that of the numbers 1 through 99, $[8 + 1 - x]$ of these numbers were distributed.

To facililitate a precise enumeration, I would pretend that each person was assigned a name-tag (i.e. Person-1, Person-2, ... Person-8). I would then assume that the specific #'s (re from 1 to 99) to be distributed would be assigned as follows:

First, a number would be assigned to the group of $x$ people.

Then, with respect to the $(8-x)$ remaining people, the person with the lowest name-tag (e.g Person-1, if he is not part of the group of $x$ people), would then be assigned a number. Following that, of the people that had (still) not yet been assigned a number, the person with the lowest name-tag would then be assigned a number, and so forth.

The idea is that if (for example) neither Person-7 nor Person-8 are part of the matching group of $x$ people, the possibility of Person-7 being assigned #51 (for example) and Person-8 being assigned #50 must be considered as distinct when compared with the possibility of Person-7 being assigned #50 and Person-8 being assigned #51.

This approach would guarantee that the numerator for this specific case would be enumerated in a manner consistent with the denominator of $D = (99)^8.$

With this in mind, there are $\binom{8}{x}$ ways of selecting $x$ people to represent the group that had matching #'s. Further, once this group is selected, then one number must be assigned to this group, and the other $(8-x)$ people must each be assigned a separate number.

Therefore, my enumeration of the numerator for this specific case is

$\displaystyle \binom{8}{x} \times \frac{99!}{\left(99 - [8 + 1 - x]\right)!}.$

As an example, when $x = 3,$ the enumeration is

$\displaystyle \binom{8}{3} \times 99 \times 98 \times 97 \times 96 \times 95 \times 94.$

Again, I am unable to grapple with how you enumerate this possibility. I hope that you are able to understand my formula for the numerator when $x=3$, and for the general value of $x$. Please leave further comments, if you have more questions/challenges/disagreements.