Probability of Collision [binary vector]

logicprobabilityproblem solvingproof-verificationstatistics

Given a collection of nonzero distinct binary vectors $x_1,…, x_m \in {0,1}^n$. To make look up faster, we hash them to vectors of length $p<n$ by means of a lienar mapping $x_i \rightarrow Ax_i$ where $A$ is a $p\times n$ matrix with 0-1 entries, and all computations are performed modulo 2.

Suppose the entries of the matrix are picked uniformly at random (each independent coin toss)…

1- If we were to pick any $1 \leq i < j \leq m $ what is the prob. that $x_i$ and $x_j$ hash to the same vector and we have a collision?

My attempt: I think that this is similar to the birthday problem from
https://preshing.com/20110504/hash-collision-probabilities/ but I'm really unsure that we are bringing in an exponential 'e'. I'd guess from that link that the probability of collision = 1 – prob(no collision). So P(collision )= $1- e^{-p*(p-1)/2m}$ =$1- e^{-2*(2-1)/2m}$ = $1- e^{-1/m}$. Is that correct or completely the wrong? … or would you leave the answer in terms of p (I think we only pick one i and j so we can simplify here?

2- Prove that if $p \geq 2log_2 m$, then with probability atleast $\frac{1}{2}$ there are no collisions among the $x_i$.

My attempt: I'd say I need the answer to above (which is why I was concerned) …. but we would need $\frac{1}{2} =$ P(No Collision) =$e^{-p*(p-1)/2m}$ … but this is not simplifying to $log_2$ and instead would mean $2mln(\frac{1}{2})= -p^{2}-p$. which is wrong b/c I will have that ln follow around …

Best Answer

I think you should have 2^p in place of p in your formula then you get the correct answer Since you are mapping into {0,1}^p each x_i gets sent to a random entry of {0,1}^p uniformly at random so it's the birthday problem with 2^p birthdays and m persons so by your estimate $\frac{1}{2} =$ P(No Collision) =$e^{-(2^p)*(2^p-1)/2m}$ $2mln(\frac{1}{2})= -(2^p)^{2}+(2^p) $. which is closer I think it's best if you use the theorem that you get probability about 0.5 if you have that the square root of the number of possible birthdays is equal to the number of people (once both are large enough)

Related Question