Solved – Chance of collisions between random numbers when inserting into set

probability

I apologize if this is too basic or whatnot, if it is just flag the question away.

I have the following computer code situation:

We need to insert 1000 new random numbers into a set. The set won't tolerate collisions, and each time we execute this job we need to add exactly 1000 random numbers into the set. To avoid the job running for "a long time", every time we generate a number that already exists in the set, generate a new one, but only do this 250 times for each such number. Each random number is in the range 0 through 99.999, which means that as the set fills up towards 100.000 unique numbers, the chance of each such number generation fails and the chance of the whole job fails is increasing exponentially towards the end.

Procedural description:

DO 1000 TIMES:
    TRIES = 0
    LABEL START
    X = RANDOM 0 THROUGH 99.999
    IF X IN SET THEN
        INCREASE TRIES
        IF TRIES > 250, FAIL THE JOB
           ELSE, GO BACK TO START
    ADD X TO SET

What I need to calculate is the chance of the job failing, when we know the existing number of numbers already in the set, which is K.

Here's my take on this.

To calculate the chance of any single randomly generated number already existing in the set:

chance = K / N

This chance goes towards 1.0 when the set fills up (K closes in on N).

The chance of all 250 attempts for a single number failing:

chance = (K / N) ^ T

This starts out low, which is expected, and steeply rises towards the end, which is also expected. Basically, only when the set is close to filling up will the chance of 250 sequential collisions rise up, and when it starts rising, it goes up pretty fast.

The chance of any of the 1000 numbers individually failing:

chance = Y * (K / N) ^ T

This last bit is obviously wrong (and I'm not entirely sure about the previous ones) since after inserting the first number into the set, there is obviously less of a chance of the next one succeeding. That last formula above calculates (presumably, if it is "correct") the chance of finding 1000 such random numbers that could fit into the set and then don't insert any of them into the set, which is not what I want.

The last part of my problem is that I want to solve the final equation for chance, to find the fill-factor of the set. Meaning: Given a chance of failure for the whole job to be 90%, how full does the set have to be in order to give me that failure chance?

Unfortunately all of this is a bit above my old math head, so any help is appreciated.

Best Answer

As you said the probability of failing on the $k_{th}$ try is ${(\frac{k}{N})}^T$. Therefore probability of the $k_{th}$ attempt being successful is $1-{(\frac{k}{N})}^T$. Therefore the probability of success of all of the first $k$ attempts is $\prod_{i=1}^k (1-{(\frac{i}{N})}^T)$. Therefore the probability of failure to get $k$ items is $1- \prod_{i=1}^k (1-{(\frac{i}{N})}^T)$. I do not know how to solve this in closed form to get value of $k$ for 0.9 . But using a for loop or binary search for $k$ should do it.

Related Question