Probability of hamming distance smaller or equal than d of two strings with k 1-bits

combinatoricsprobabilistic-methodprobability

Let two bit-strings of size $n$ have $k$ number of $1$-bits. If the first bit-string has the form $1^k0^{n-k}$ (for $n=5$ and $k=2$ the bit-string would look $11000$) and the second bit-string is chosen uniformly at random from the set of $\binom{n}{k}$ bit-strings with $k$ $1$-bits. What is the probability that the Hamming distance between the two strings is smaller or equal than $d$.

Obviously for $d=0$ the probability is $\frac{1}{\binom{n}{k}}$ and for $d=2k$ the probability would be 1 since the hamming distance between the two bit-strings is at most $\min{(2k,2(n-k))}$. My intuition tells me that the probability should be exponentially small in $k$ if $d=o(k)$, but I cannot wrap my head around the general problem and how to write it as a formula.

I do not need an exact formula although it would be appreciated. A relatively tight upper bound for the probability is enough.

Best Answer

Let $A(n,k)$ be the set of strings of Hamming Weight $k$ and length $n.$ Let $J=\{1,\ldots,k\}.$ I will count instead of dividing by $\binom{n}{k}$ to get the probability. Call $M(d)$ the number of strings at Hamming distance exactly $d$ from $a$ where $a=(1,\ldots,1,0,\ldots,0)$ in $A(n,k)$.

As you observed there is a unique sequence ($a$ itself) at Hamming distance zero.

Clearly $M(d)=0$ if $d$ is odd since removing a one from $J$ means adding a one to its complement thus giving even Hamming distance. Consider Hamming distance $d=2.$ This corresponds to a sequence with $k-1$ ones in $J$ and a single one in $[n]\setminus J.$ Therefore $$ M(2)=\binom{k}{1}\binom{n-k}{1}. $$ In general, if $d=2d',$ for $d'\in \{1,2,\ldots,m\}$ we get $$ M(d)=\binom{k}{d'}\binom{n-k}{d'}. $$ The number of strings in $A(n,k)$ at Hamming distance $\leq d$ from $a$ is then $$ N(d)=\sum_{0\leq d'\leq d} M(d) $$ which yields $$ N(2)=1+k(n-k), $$ and $$ N(4)=1+k(n-k)+\frac{k(k-1)n(n-1)}{4}, $$ and $$ N(d)=1+k(n-k)+\frac{k(k-1)n(n-1)}{4}+\cdots+\binom{k}{d'}\binom{n-k}{d'} $$ where the last term is dominant. All this assumes that $n\geq 2k,$ otherwise you'd need to truncate your sum below $2k$.

Now use your favourite binomial coefficient inequalities. For small $d$ relative to $n$ the sequence $N(d)$ is superincreasing in $d$ so a constant times the last term gives a decent upper bound.