[Math] Solution conflict: Expected number of distinct birthdays for $100$ people

birthdayexpected valueprobabilityrandom variables

I was given a homework question that is stated in the title. Although I have a conflict with the solution provided, and was wondering if you could help me understand why the solution is correct or if it is indeed incorrect.

Define $X$ to be number of distinct birthdays.

The answer given is to set up a RV $X_i$ which is $1$ if the ith day is a birthday or $0$ otherwise, where:

$P(X_i = 1) = P(\text{at least one person has birthday on day i}) = 1- P(\text{no one has birthday on this day}) = 1 – \frac{364}{365}^{100}$. And so $\mathrm{E}X_i = 1 – \frac{364}{365}^{100}$

Thus $\mathrm{E}X =\mathrm{E}[X_1 + X_2 \dots X_{365}] = 365\left (1 – \frac{364}{365}^{100} \right)$

I think this is incorrect, however. The reason being is that it seems like they are calculating the expected number of birthdays not the expected number of distinct birthdays.

The answer that I think is correct is to define $X_i$ as $1$ if the ith day is a distinct birthday and $0$ otherwise. Then:

$P(X_i = 1) = 100 \times \left(\frac{1}{365}\right)\left(\frac{364}{365}\right)^{99}$.

Thus $\mathrm{E}X =\mathrm{E}[X_1 + X_2 \dots X_{365}] = 365 \times 100 \times \left(\frac{1}{365}\right)\left(\frac{364}{365}\right)^{99} = 100 \times \left(\frac{364}{365}\right)^{99}$.

This has been bothering me for quite some time. Any help would be great.

Best Answer

The provided solution is correct. When it computes the chance that somebody has a birthday on Jan 1, it doesn't care how many people share the birthday. Then it says each day has the same chance of being somebody's birthday and uses the linearity of expectation.

We can see what is going on with smaller numbers. Say we throw two dice and ask what is the expected number of different numbers seen. We can do the problem directly by saying the first die is some number. The second die has $\frac 56$ chance of adding a new number, so the expected number of distinct numbers seen is $\frac {11}6$. This is less than $2$ because of the chance that the two numbers are the same. The approach in the solution you quote is to say the chance $1$ does not appear is $(\frac 56)^2$, so the chance it does appear is $1-(\frac 56)^2=\frac {11}{36}$. Then the expected number of numbers we see is $6 \cdot \frac {11}{36}=\frac {11}6$

Related Question