[Math] what is the expected number of collisions

combinatoricsprobabilityprobability distributionsprobability theory

Suppose we use a hash function $H$ to hash $N$ distinct balls into $M$ distinct bins. Assuming simple uniform hashing, what is the expected number of collisions?

Note that a collision is defined by adding a ball to an already occupied bin. If the already occupied bin has $k$ balls in it, then the number of collisions upon adding a new ball is $k.$


By using expectation, I tried as :

=> 1 × Probability of collision in first insertion +

2 × Probability of collision in second insertion + ………. +

n × Probability of collision in nth insertion

=> $(1 ∗ 0) + (2 ∗ 1/m) + (3 ∗ 2/m) + (4 ∗ 3/m) + … + (n ∗ n−1/m)$


Actually, The answer is $(n^2 – n)/2m$


But, I am not getting the answer. Where am I wrong here ?

Best Answer

The expected number of new collisions caused at the time of inserting the $k$-th ball is $\frac{k-1}{M}$ since it has a $\frac1M$ collision probability with each ball already placed.

Thus the expected number of collisions is

$\frac0M+\frac1M+\frac2M+\cdots+\frac{N-1}{M}=\frac{N(N-1)}{2}\cdot\frac1M$