Calculation of probability that at least 2 out of n people have the same birthday leads to paradox

birthdaycombinatoricsprobability

Usually, the way the probability that at least 1 pair out of n people have the same birthday is calculated is by calculating the probability of the converse statement – the probability that no pair has the same birthday. I attempted to calculate the probability the straightforward way.

I use the facts that the probability of $A \lor B$ occurring is $P(A) + P(B)$. And the fact that the number of ways of picking an unordered pair from a set of $m$ elements is $\frac{(m-1)m}{2}$. The probability that a given pair of people have a birthday on the same day is $\frac{1}{365}$. Let our $n$-element set (of people, say) be the following: $\{X_1,X_2,…,X_n\}$. And let the relation $\sim$ of $X_a$ and $X_b$ mean that the people $X_a$ and $X_b$ have the same birthday. The probability that we're after is the following:
\begin{equation}
P[(X_1 \sim X_2) \lor (X_1 \sim X_3) \lor \cdots \lor (X_1 \sim X_n) \lor (X_2 \sim X_3) \lor \cdots \lor (X_{n-1} \sim X_{n})]
\end{equation}

In other words, all possible pairs OR-ed with each other. However, using the OR fact that I mentioned earlier, we can split it up into separate probabilities:
\begin{equation}
P(X_1\sim X_2) + P(X_1 \sim X_3) + \cdots
\end{equation}

Since there are $\frac{(n-1)n}{2}$ pairs and the probablity of each pair having the same birthday is $\frac{1}{365}$, the total probability should be $\frac{n(n-1)}{2}\frac{1}{365}$

However, take 40 people. By the formula I just calculated, the probability should be $\frac{39*40}{2}\frac{1}{365} \approx 2.1$, which is more than 100%… But there exist arrangements of 40 people in 365 days where none have the same birthday, so the porbability should definitely be less than 100%

Also this doesn't seem to match the formula you get the other way:
\begin{equation}
\frac{365!}{365^n(365-n)!}
\end{equation}

Essentially, my question is, where is the mistake in the above calculation? On a side note, this seems like a pretty nice false-proof.

Best Answer

The probabilities are not disjoint. For example, if a collection $3$ people all share a birthday, then we have $X_1\sim X_2$ and $X_2\sim X_3$ and $X_1\sim X_3$. Your calculation triple-counts that case, and it sextuple-counts a case where there are $4$ people with a common birthday, etc.

This means that we can't just say $P(A\lor B\lor C\lor \cdots)=P(A)+P(B)+P(C)+\cdots$, because this will (as you notice) lead to an overestimate of the probability. If you're doing this, you need to essentially add and subtract off the probabilities that multiple people share birthdays, utilizing the inclusion-exclusion principle. This leads to complicated computations that don't scale well, so counting the complement is considerably easier.