Let $h:D\to [m]$ be a hash function that maps some object from the set $D$ to a natural number $k$ ($1 \leq k \leq m$) with probability $1/m$ for every $k\in [m].$ A falsely colliding pair are two elements $d_1, d_2 \in D$ with $d_1 \neq d_2$ but $h(d_1) = h(d_2).$ The probability for a collision between $d_1$ and $d_2$ is $1/m$ because we have $m$ possible positions where they could collide, for each of which there is a $1/m^2$ chance that both elements get mapped to this position. Is my argumentation correct?
Colliding pairs – hash functions
hash function
Related Solutions
$$P(\text{Colllision}) = 1 - P(\text{no collision}) = 1 - {70\over100}\cdot{69\over 100}.$$
Let $\mathbf{1}_{ij}$ denote the indicator random variable for the event that keys $i$ and $j$ are mapped to the same bucket. You are then looking for $\mathbb{E}\left[ \sum_{1\leq i\leq j \leq n} \mathbf{1}_{ij} \right].$ By linearity of expectation, that is
$$ \sum_{1\leq i \leq j \leq n} \mathbb{E}[\mathbf{1}_{ij} ]$$
There are $\binom{n}{2} = \frac{n(n-1)}{2}$ such $i,j$ pairs that we are summing over. Each term is $$\mathbb{E}[\mathbf{1}_{ij}] = \mathbb{P}(\text{keys i and j are mapped to the same bucket}).$$
Since they are mapped uniformly and independently into $m$ buckets, the probability that two keys are mapped to the same bucket is $\frac{1}{m}.$ So the 3rd option is the correct answer.
The step going from "there are $n/m$ expected elements in any bucket" to "so the number of colliding pairs is.." does not work. For example, $n/m$ may not be an integer. Using this approach, we would need to figure out the distribution of the bucket counts. Say we have $2$ buckets and $2$ keys. The two buckets can have the following number of keys in them: $(0, 2), (0,2)$ and $(1,1). (1,1)$ occurs with probability $1/2$, the others have probability $1/4$. Then for each case, we apply your logic to get the number of pair collisions. For $(0,2)$, there's $0$ pairs colliding in bucket $1$ and $1$ pair in bucket $2$. So in this case, there is $1$ colliding pair and this case happens with probability $1/4.$ In the $(1,1)$ case there are $0$ colliding pairs in total, and this case has probability $1/2.$ In the $(0,2)$ case we again have $1$ colliding pair, and that case is probability $1/4.$ So the expected value for $n=2, m=2$ is $$1\cdot(1/4) + 0\cdot(1/2) + 1\cdot(1/4) = 1/2. $$
In principle we can do this same computation for general $n,m$ but these are the types of questions where linearity of expectation comes in so useful - when we only want an expected value, linearity of expectation often allows us to dodge actually computing the distribution.
Best Answer
This is the initial step of the classical birthday paradox and in cryptography it is called the birthday attack.
Consider a uniform hash function with output space size $m$. Then the probability of collision of two distinct input values to have the same hash value, $a,b\in D, a\neq b$ and $h(a) = h(b)$ is $1/m$.
Consider that $h(a)$ can be any value equal likely and $h(b)$ has $1/m$ probability of hitting the same value, i.e. $h(a) = h(b)$, and $(m-1)/m$ probability of having different value, i.e. $h(a) \neq h(b)$.
In the case that you are calculating the probability of the collision by combining for each position as aschepler stated.
$$P(\mathrm{collision}) = \sum_{x \in \mathbb{Z}_m} P(\mathrm{collision~at~}x) = m(1/m^2) = 1/m$$
This is another way to achieve this. For uniform random functions, this is more complex than the standard way, however, if the function is not uniform this is the way to go.
Note: Usually, in cryptography, the hash functions are considered $$H:\{0,1\}^*\to \{0,1\}^n$$ where ${ }^*$ is the Kleene star indication the strings of 0 and 1 in arbitrary sizes, including the empty string.
Then the output space contains $2^n$ elements.