[Math] Expected maximum number of collisions for universal hash function

hash functionprobability

If we hash a set $S$ of $n$ keys into a table of size $n$ with a universal hash function $h$, what is the expected maximum number of keys that collide?

We break down this computation into a sequence of easier steps, as follows.
Let $A_j$ be the event that at least one slot in the hash table has ≥ j keys. We compute
the largest $j$ for which $Prob[A_j] ≤ 1/2$; that $j$ is our answer. Calculating $A_j$ directly
is not straightforward, so we proceed as follows.

(a) Let $A^1_j$ be the event that the table slot 1 gets ≥ $j$ keys under $h$. Supposing you
know $Prob[A^1_j]$, give an upper bound on $Prob[A_j]$.

(b) Let $B$ be the event that a fixed subset $C ⊂ S$ of size $|C| = j$ hashes into slot 1.
That is, each key of $C$ maps to slot 1 under $h$. Calculate the probability $Prob[B]$.

(c) Use $Prob[B]$ to get an upper bound on the probability $Prob[A^1_j]$.

(d) Compute the largest value of j for which $Prob[A^1_j] ≤ \frac{1}{2n}$. Explain how in combination with (a), this $j$ is the expected maximum number of collisions.

I am not sure how to solve this problem using the steps shown. I have managed to find some resources to solve the slot-size bound for hash chaining but I am unable to follow the given steps logically.

Best Answer

For $a$ the point is that $Prob[A_j] \le nProb[A_j^1]$ because each slot has the same chance to have at least $j$ entries. It will actually be less than this because the right side counts cases where two slots each have $j$ entries twice while the left side counts them once. We were asked for an upper bound, so this is not a problem.

For $b$ the idea is to assume that each key maps to a slot randomly, so the chance a given key maps to slot $1$ is $\frac 1n$

For $c$ you just use $b$ and multiply by the number of subsets of size $j$