[Math] Collision probability

probability

We are using character strings (out of 64 possible characters) to identify unique elements in a 'bag' of data. each bag has on average 20 elements, but up to 300.

I am trying to determine what size should that string be so that the probability of a collision (if we pick the characters randomly) is less than 1 in a 1,000,000 for 20 elements, and then for 300 elements.

Best Answer

From Wikipedia ( http://en.wikipedia.org/wiki/Birthday_problem#Probability_table), a one in a million chance to collide is reached very quickly; with only 32 bits (a 4-byte integer value in most languages), 93 elements have a one in a million chance to collide. You can Base-64 encode this integer using your alphabet. With 64 bits (a "long" in most languages), the number of hashes at which there's a one-in-a-million chance increases to 6.1 million.

Related Question