[Math] Problem with the Birthday Problem

birthdayprobabilityprobability theory

I recently learned of the Birthday Problem in probability theory, which essentially states that it only takes 23 people in a room to have a 50% chance that 2 of those people have the same birthday. When I try to calculate this myself, I keep coming up with a very different answer, and I am trying to figure out where I am going wrong. Given the problem "how many people does it take to make it 50% likely that 2 will have the same birthday?", here is how I went about solving it:

First, set this up instead as the problem:

"Given $n$ people (where $n$ is selected arbitrarily), what is the probability that 2 people will have the birthday $q$ (where $q$ can be some number between $1$ and $365$ representing a day of the year)"

Well, the probability of having at least 1 person with the birthday of $q$ is $\frac{n}{365}$. To get the probability of 2 people, you would multiply the probability of having one person by $\frac{n-1}{365}$ (based on the Multiplication Rule of Probability, and the fact that we don't want to count the same person twice).

So we can say, given $n$ people, the probability that 2 of them will both have the birthday $q$ (and therefore the same birthday) is $\frac{n(n-1)}{365^2} = \frac{n^2-n}{365^2}$.

Then I just say, given a probability of $\frac12$, solve for $n$.

$$\frac{n^2-n}{365^2} = \frac12$$
$$n^2-n = \frac{365^2}{2}$$
$$n^2 – n – \frac{365^2}{2} = 0$$
$$n\approx259$$

So, based on this method, it should take $259$ people to have a greater than 50% probability that 2 of them will have the same birthday, not $23$.

Where did I go wrong with solving it this way?

Best Answer

Well, the probability of having at least $1$ person with the birthday of $q$ is $n/365$.

Unfortunately, this isn't quite true. If you could guarantee that the $n$ people didn't share a birthday, then it would be true, but if you could guarantee that, there wouldn't be much of a birthday question! The actual probability that at least one person out of $n$ has $q$ as his or her birthday is

$$ 1-\left(\frac{364}{365}\right)^n $$

where we assume (as is typical for initial analyses of this problem) that birthdays are uniformly distributed over a $365$-day year. If $n$ is small, this is approximately equal to $n/365$ (and it is exactly true for $n = 1$), but this is increasingly untrue as $n$ becomes larger.

Note that if $n > 365$ then $n/365 > 1$, but we know it is perfectly possible for each of (let's say) $400$ people to happen not to have May $21$ for their birthday. The above expression gives that probability (usual assumptions applying).

To get the probability of $2$ people, you would multiply the probability of having one person by $(n−1)/365$.

This also isn't true. In order to determine the probability that at least two people have $q$ as their birthday, it's simplest to subtract from the above probability the chance that exactly one person has $q$ as their birthday. This yields

$$ 1-\left(\frac{364}{365}\right)^n -n\left(\frac{1}{365}\right)\left(\frac{364}{365}\right)^{n-1} $$

As before, what you say is approximately true for small $n$ (and exactly true for $n = 2$), but still is increasingly untrue for larger $n$, for nearly the same reason.

And if you wanted the probability of at least $k$ of the $n$ people sharing $q$ as their birthday, we could continue subtracting all the way up to $k-1$ people sharing $q$ as their birthday:

$$ 1-\sum_{i=0}^{k-1} \binom{n}{i} \left(\frac{1}{365}\right)^i \left(\frac{364}{365}\right)^{n-i} $$

So we can say, given $n$ people, the probability that $2$ of them will both have the birthday $q$ (and therefore the same birthday) is $(n^2-n)/365^2$.

Even if the above were true, the analysis would have to be expanded to cover all possible $q$, rather than only a single $q$. If you evaluate my second expression above at $n = 2$, the result is $1/365^2$, the probability that two people share $q$ as their birthday. If you multiply that by $365$, the number of possible choices for $q$, you get $1/365$, which is the probability that two people share any day as their birthday. However, that expression cannot be extended to expand beyond $n = 2$ exactly, because it neglects the possibility that more than two people share the same birthday.

The standard analysis of the birthday problem is the most straightforward, in which the probability of at least two of $n$ people sharing a birthday is

$$ 1-\left(\frac{365}{365}\right) \left(\frac{364}{365}\right) \left(\frac{363}{365}\right) \cdots \left(\frac{366-n}{365}\right) $$

Related Question