[Math] the average Levenshtein distance between two random binary strings of length $L$

binarycoding-theoryprobability

For example, for length $L=7$, two random binary strings might be:

0100101
1010011

The Levenshtein distance here would be 5, as it would require 5 bit-flips to change one to the other.

I attempted a derivation on paper that led me to $$\frac{1}{2^L}\sum_{k=0}^{L}\left(k\cdot{L\choose{k}}\right)$$ which Mathematica gracefully simplified to $$\frac{L}{2}$$

This seems correct, but (my) intuition can often be misleading. If true, though, it'd be inconsistent with something else I'm working with (another person's work), and I'd like to have more confidence before raising question. Surprisingly, I don't see anything on Google, either.

Could anyone point to a source regarding, or confirm by their own derivation, what the average Levenshtein distance between two random binary strings of length $L$, is?

Thank you very much.

UPDATE

@vadim123 has shown me that my premises were wrong. Pulled from his comment:

What you are calling Levenshtein distance is commonly called Hamming distance. For example, 01010101 and 10101010 have Hamming distance 8, but Levenshtein distance 2 (add a 1 at the beginning, delete a 1 at the end).

The actual Levenshtein distance of my example is 2!

 0100101
a
10100101
      d
101001 1

So, now, I'm at a complete loss. What is the real formula, then? Is there one?

Best Answer

Set $X_i$ to be the 0/1 random variable denoting whether two binary strings differ in position $i$. These $L$ random variables are independent, and each 0/1 with equal probability. By linearity of expectation, $E(X_1+X_2+\cdots+X_L)=E(X_1)+E(X_2)+\cdots+E(X_L)=1/2+1/2+\cdots+1/2=L/2$.

Related Question