Colliding pairs – hash functions

hash function

Let $h:D\to [m]$ be a hash function that maps some object from the set $D$ to a natural number $k$ ($1 \leq k \leq m$) with probability $1/m$ for every $k\in [m].$ A falsely colliding pair are two elements $d_1, d_2 \in D$ with $d_1 \neq d_2$ but $h(d_1) = h(d_2).$ The probability for a collision between $d_1$ and $d_2$ is $1/m$ because we have $m$ possible positions where they could collide, for each of which there is a $1/m^2$ chance that both elements get mapped to this position. Is my argumentation correct?

Best Answer

This is the initial step of the classical birthday paradox and in cryptography it is called the birthday attack.

Consider a uniform hash function with output space size $m$. Then the probability of collision of two distinct input values to have the same hash value, $a,b\in D, a\neq b$ and $h(a) = h(b)$ is $1/m$.

Consider that $h(a)$ can be any value equal likely and $h(b)$ has $1/m$ probability of hitting the same value, i.e. $h(a) = h(b)$, and $(m-1)/m$ probability of having different value, i.e. $h(a) \neq h(b)$.


In the case that you are calculating the probability of the collision by combining for each position as aschepler stated.

$$P(\mathrm{collision}) = \sum_{x \in \mathbb{Z}_m} P(\mathrm{collision~at~}x) = m(1/m^2) = 1/m$$

This is another way to achieve this. For uniform random functions, this is more complex than the standard way, however, if the function is not uniform this is the way to go.


Note: Usually, in cryptography, the hash functions are considered $$H:\{0,1\}^*\to \{0,1\}^n$$ where ${ }^*$ is the Kleene star indication the strings of 0 and 1 in arbitrary sizes, including the empty string.

Then the output space contains $2^n$ elements.

Related Question