Unknown distribution for birthday problem

birthdayprobability

Coming from Blitzstein's book:

In the birthday problem, we assumed that all 365 days of the year are
equally likely (and excluded February 29). In reality, some days are
slightly more likely as birthdays than others. For example, scientists
have long struggled to understand why more babies are born 9 months
after a holiday. Let $\textbf{p}= (p_1, p_2, …, p_{365})$ be the
vector of birthday probabilities, with $p_j$ the probability of being
born on the $j$th day of the year (February 29 is still excluded, with
no offense intended to Leap Dayers). The $k$th elementary symmetric
polynomial
in the variables $x_1,…, x_n$ is defined by

$e_k(x_1, …, x_n) = \sum_{1 \leq j_1 < j_n < … < j_k \leq n}{x_{j_i} … x_{j_k}}$.

This just says to add up all of the $n \choose k$ terms we can get by
choosing and multiplying $k$ of the variables. For example, $e_1(x_1, x_2, x_3) = x_1 + x_2 + x_3$ and $e_2(x_1, x_2, x_3) = x_1x_2 + x_1x_3 + x_2x_3$.

Now let $k \geq 2$ be the number of people.

a) Find a simple expression for the probability there is at least one
birthday match, in terms of $\textbf{p}$ and an elementary symmetric
polynomial.

I am confused how to do this problem; I've tried a couple approaches. Let $x_j$ be the the day of the year ($x_1$ = January 1, $x_{364}$ = December 30th, etc..).

  1. PIE

$P$(at least one match) = $P$(match on $x_1 \cup$ match on $x_2 \cup … \cup $ match on $x_{365}$.

Now if we look at this with two people, is is a simple: $\textbf{p} \cdot \textbf{p}$ because person 1 can have each birthday $x_i$ with $P(x_i) = p_i$ and for there to be a match person 2 must also have the birthday on $x_i$ . Each event is disjoint, so we have $\textbf{p} \cdot \textbf{p}$.

With 3 people, it becomes a little more confusing with $P$(match on $x_i$) = ${3 \choose 2} p_i^2(1-p_i)^1 + {3\choose3} (p_i)^3$, which I arrived to by saying a birthday on $x_i$ is a success with $P(x_i) = p_i$ and continued as a binomial distribution for 2 or 3 successes.

Note: This can easily be extended that given n $n$ people, the probability 2 or more share a birthday on $x_i$ is the sum from $j=2$ to $n$ successes in a binomial distribution: $\sum_{j=2}^{n} {n \choose j}p_i^j(1-p_i)^{n-j}$

So for 3 people, we still have disjoint events and we have

$\textbf{p}_{k=3} = {{3 \choose 2} p_i^2(1-p_i)^1 + {3\choose3} (p_i)^3}_{1 \leq i \leq 365}$ and the probability of intersecting birthdays is $\textbf{p}_{k=3} \cdot \textbf{p}_{k=3}$.

Once we get to four people, the work becomes really messy as PIE has to start getting used, and honestly my math is probably either A) wrong or B) messy. So, I decided to not take PIE approach.

  1. Counting compliment

Again, with 2 people it is the trivial $1 – (\textbf{p} \cdot (\textbf{1 – p}))$

Immediately, with 3 people and more I realized there is a brutal tree structure where we effectively turn everything into cases, which definitely seemed like the wrong approach.

Could someone help me with what the approach might be for this question? The other post related to this did not help me very much. Thank you in advance!

Best Answer

The complementary event is the event that all $k$ birthdays are distinct.

We can split this event up into $\binom nk$ cases by the choice of $k$ days $\{j_1, j_2, \dots, j_k\}$ that appear as birthdays in the group. For each choice of $k$ days, there are $k!$ ways to assign those days to people, and then there probability is $p_{j_1} p_{j_2} \cdots p_{j_k}$ that all $k$ people have the birthdays we assigned them.

Here's a concrete example. Suppose that $k=3$ and there are only $5$ days in the year. Then the probability that all $3$ birthdays are distinct is $$6 p_1p_2p_3 + 6p_1 p_2p_4 + 6p_1p_2p_5 + 6p_1p_3p_4 + 6p_1p_3p_5 + 6p_1p_4p_5 + \\ + 6p_2p_3p_4 + 6p_2 p_3p_5 + 6 p_2p_4 p_5 + 6p_3p_4p_5.$$

In general, this expression is simplify $k!\, e_k(\mathbf p)$.

Related Question