[Math] Expanding Birthday Paradox / Expected Value

birthdayexpectationprobability

Many of us are probably familiar with the birthday problem where you have however many people in a room and the probability that at least two people in the room share a birthday is some percentage between zero and one (and likely higher than expected if you are someone that isn't familiar with this problem).

I am trying to extend this into a scenario where I can find the expected value (and maybe even variance if I have time) of said experiment.

Here is the scenario:

I enter an empty room with one door leading into it. Then let's suppose we have an infinite amount of people with a random birth date (year is irrelevent). I ask one person to come in at a time and when each person enters the room, we check to see if there are any shared birthdays in the room. If so, then the experiment is over. If not, then the next person is brought in and the birthdays are checked again, until a success happens. What is the expected number of people that will be in the room (including myself) when a success happens?

Assumptions

  • The current year is a non-leap year (with only $365$ days, removing Feb $29$ possibility)
  • Each day of the year is equally likely to have been a possible birth date.
  • There aren't any sets of twins, triplets, etc.

In the link provided above, it says that $20$ is the answer to this kind of problem, but it seems like a different approach was used that what I am trying. I am hoping my method also gives the same answer.

My Attempt

Expected Value is generally calculated by $$E(X) =\sum_k{x_k \space \space {P(x_k)}}$$

So substituting in to fit this scenario, we have: $$E(X) =\sum_{k=1}^{366}{k \space \space {P(X=k)}}$$
with ${P(X=k)}$ in this case being the probability that a success happened when the $k^{th}$ person walked into the room (again, including myself).

We can deduct a formula for ${P(X=k)}$, which is $${P(X=k)} = \left(\prod_{i=1}^{k-1} \frac{{i!}{365 \choose i}}{365^i} \right) \times \left(1-\frac{{k!}{365 \choose k}}{365^k} \right) $$
supposing we let $k \ge 2$. We also know that $P(X=1) = 0$ and that we don't need to go beyond $366$ people, as this is the highest possible amount of people needed in the room to be able to get the success.

The Problem

I feel that my formula for ${P(X=k)}$ is overestimating slightly. With the particular problem being this part: $$\left(1-\frac{{k!}{365 \choose k}}{365^k} \right)$$
This is the typical formula you may see for the birthday problem, but it is making a wrong assumption in that this formula is considering the possibility that more than two people could be sharing a birthday (or that everyone is sharing the same birthday!). In this fictional scenario, that isn't possible. There would be two and only two people with the same birthday in the room. Although I feel like this answer and the real answer may round to the same integer anyways $\left(20\right)$, I would like to be as precise as possible. This question tries to tackle this, but with a fixed group of $23$ people instead of a generalized solution.

There is also the problem of attempting to solve for $E(X)$ by machine (I tried what I have so far with a TI-89 but it couldn't calculate all this without freezing or giving an overload error), so I will need to try putting this into Mathematica later, but figuring out how to do all that is something I can figure out on my own.

So what I need help on is this part: when generalizing for a group of $k$ people, what is the probability that exactly 2 people share a birthday? (the second part of the formula for finding $P(X=k)$. )

Last second thoughts before posting

Doing some scratch work on paper my guess is that the true formula for $P(X=k)$ is $${P(X=k)} = \left(\prod_{i=1}^{k-1} \frac{{i!}{365 \choose i}}{365^i} \right) \times {\left( k-1 \right)}\space \frac{365!}{\left( 365-k+1\right)!} \left( \frac{1}{365} \right)^k $$
which came to me using some inspiration from this post.

Best Answer

Observe that $P(X\ge k)$ is much simpler to calculate: it is merely the probability that in a group of $k-1$ people, no two share a birthday. Thus

$$P(X\ge k)=1\cdot\frac{365-1}{365}\cdot\cdots\cdot\frac{365-(k-2)}{365}=\prod_{n=0}^{k-2}\left(1-\frac n{365}\right)$$

for $k\ge2$. Using the alternative formula for expected value of a random variable which takes only positive integer values, we find

$$E(X)=\sum_{k=1}^\infty P(X\ge k)=1+\sum_{k=2}^{366}\prod_{n=0}^{k-2}\left(1-\frac n{365}\right).$$

WolframAlpha calculates this to be approximately $24.62$.

Related Question