[Math] Question on probability in hashing

probability

Consider a hash table with n buckets, where external (overflow) chaining is used
to resolve collisions. The hash function is such that the probability that a key
value is hashed to a particular bucket is
1/n.
The hash table is initially empty and K
distinct values are inserted in the table.
(a) What is the probability that bucket number 1 is empty after the Kth
insertion?
(b) What is the probability that no collision has occurred in any of the K
insertisons?
(c) What is the probability that the first collision occurs at the Kth insertions?

Best Answer

(a) The probability that bucket 1 is empty after ONE insertion is $(n-1)/n$. That's the probability that the first item didn't hash to bucket 1. The event that it's empty after TWO insertions is defined by "first item missed bucket 1" AND "2nd item missed bucket one". With this, you can (I hope) compute the probability that the bucket's empty after two insertions. From this, you can generalize to $K$ insertions.

(b) For $K = 1$, it's 1. For $K = 2$, the second item must miss the bucket of the first item. So it has $n-1$ places it can safely go. The probability of success is therefore $\frac{n-1}{n}$. What about the third item? It has only $n-2$ places it can go. So the probability for $K = 3$ is $1 \cdot \frac{n-1}{n}\cdot \frac{n-2}{n}$. I'll bet you can generalize. Be careful of the case $K > n$.

(c) Once you work out the details of the first two parts, you can probably make progress on this one as well. Hint: the first collision occurs on the $k$th insertion if (i) the first $k-1$ insertions didn't collide (see part b) and (ii) the $k$th insertion DOES cause a collision (see the complement of part b).