Solved – Birthday paradox: How to estimate the probability of two or more people in a group of 30 sharing a birthday

birthday paradoxprobabilityr

I might be overthinking this. I generated the output in R and 5 of my 10 samples were successful, so that's 50%. Given that, if I am to estimate the probability of two or more people in a group of 30 sharing a birthday, what is my total sample? Should I be using combinations?

Best Answer

How are you generating your birthdays? To generate 23 birthdays:

dates = sample(1:365, 23, replace = TRUE)

To see if 2 or more share the same birthday:

length(dates) != length(unique(dates)) # TRUE if there are duplicates

How often is the above TRUE?

dupe_count = 0
runs = 1000000
for (i in 1:runs) {
  dates = sample(1:365, 23, replace = TRUE)
  if (length(dates) != length(unique(dates))) {
    dupe_count = dupe_count + 1
  }
}
print(dupe_count / runs)

[1] 0.508158

This closely matches the theoretical value of 50.7% in the wikipedia page