[Math] Probability that a set of ‘N’ random binary strings are all at least a certain Hamming distance ‘k’ apart

combinatoricsprobability

Imagine I have a set of $N$ binary strings of length $L$, where I generate each string randomly (say, by flipping a coin for each bit). What is the probability that all $N$ strings are at least a Hamming distance $k$ apart?

I would be happy with a good lower bound estimate on the probability that all strings are unique. We can estimate the relative sizes of $N$, $L$, and $k$ as: $N >> L$ (by at least an order of magnitude), $5 \leq L \leq 100$, and $k < L$.

Best Answer

There are $2^L$ binary strings of length $L$. The number of ordered $N$-tuples of distinct strings is $(2^L)!/(2^L-N)!$, so the probability that all are unique is $\dfrac{(2^L)!}{(2^L-N)! 2^{NL}}$. As $M = 2^L \to \infty$ with $N$ fixed, that is asymptotically $$1 - \dfrac{N(N-1)}{2M} + \dfrac{N(N-1)(N-2)(3N-1)}{24M^2} + \ldots$$

The Hamming distance between a pair of random binary strings of length $L$ is a binomial random variable with parameters $L$ and $1/2$. For $k << L$ the probability that this distance is $k$ is ${L \choose k} 2^{-L}$. In $N$ random strings the expected number of unordered pairs with distance $< k$ is $\mu = \dfrac{N(N-1)}{2} \sum_{j=0}^{k-1} {L \choose j} 2^{-L}$. When $\mu$ is small, we should be able to approximate the distribution of the number of such pairs with a Poisson distribution of parameter $\mu$, so the probability that there are no such pairs is approximately $e^{-\mu}$.

Related Question