[Math] Probability of two people having the same birthday

probability

In my Probability course, we have dealt with the Birthday problem. We have been told a reasonable "outcome space" is the $n$-tuples of integers between 1 and 365, identified with the days of the year, if we forget about years with a 29th of February. Then we placed the uniform probability on the space, noting that it is not the exact probability since not all combinations are equally probable, but that using it gives an approximation that is good, and that underestimates the actual probability of having two people with the same birthday. So I was wondering: how do I prove this actually is an underestimate?

Best Answer

Denote the probability of a given person's birthday being on the $k$th day of a year of $n$ days by $p_k$. Then, the probability that two random people have the same birthday is $$P = \sum_{k = 1}^n p_k^2.$$ In particular, when $p_1 = \cdots = p_n$, this probability is $\frac{1}{n}$.

To show that this is an underestimate, it's enough to show that $P$ is actually minimized by the probability distribution $p_1 = \cdots = p_n$ (and to know that the probabilities aren't actually evenly distributed). To set up the minimization problem, note that if we know $p_1, \ldots, p_{n - 1}$, then $$p_n = 1 - p_1 - \cdots - p_{n - 1}.$$ So as a function of $p_1, \ldots, p_{n - 1}$, the probability that two random people have the same birthday is $$P(p_1, \ldots, p_n) = \sum_{k = 1}^n p_k^2 = \sum_{k = 1}^{n - 1} p_k^2 + (1 - p_1 - \cdots - p_{n - 1})^2,$$ and we must minimize this quantity over the region of possible probabilities, namely the simplex $S$ defined by the inequalities $$0 \leq p_1, \ldots, p_{n - 1}, \qquad p_1 + \cdots p_{n - 1} \leq 1.$$

Now, we must solve the system $$\frac{\partial P}{\partial p_k}, \qquad 1 \leq k \leq n - 1.$$ Differentiating the above formula for $p$ and setting to zero gives $$0 = \frac{\partial P}{\partial p_k} = 2 p_k - 2 (1 - p_1 - \cdots - p_{n - 1}).$$ The quantity in parentheses is $p_n$, so this system gives that $0 = 2 p_k - 2 p_n$, or rather, $$p_k = p_n, \qquad 0 \leq k \leq n - 1.$$ But this simply says that there is an critical point where $$p_1 = \cdots = p_n = \frac{1}{n},$$ as desired (in particular, this point is in the simplex $S$, i.e., it corresponds to a distribution of probabilities of birthdays). Since $P$ is quadratic in its arguments, the matrix $$\left(\frac{\partial^2 P}{\partial p_i \partial p_j}\right)$$ of mixed partials is constant, and checking shows that it is positive definite. So, the solution $$p_1 = \cdots = p_n = \frac{1}{n}$$ is the unique global minimum, i.e., if the distribution of birthdays is nonuniform, then $P\left(\frac{1}{n}, \ldots, \frac{1}{n}\right) = \frac{1}{n}$ is an underestimate of the probability that two random people have the same birthday.

Remark (One can avoid calculus when producing the system characterizing the critical point of the probability function $P$ simply by using that a quadratic function $q(t) = at^2 + bt + c$, $a \neq 0$ achieves its extremum at $t = -\frac{b}{2a}$.)

Related Question