Hamming Distance between two equal strings with repetition.

algorithmscoding-theory

The Hamming Distance between two strings of equal length is the number of characters that are different from one another, in the same position, between two strings of equal length. You can also say that it's the number of "edits" you need to make in order to change one string into the other. For example if we have $s_1=\text{Anna}$ and $s_2=\text{Sara}$, then the Hamming Distance is $3$ or if we have $s_3=\text{Casablanca}$ and $s_4=\text{Blackjacks}$ then the Hamming Distance would be $9$. But what if we have the following example: $s_5=\text{Google}$ and $s_6=\text{Baangs}$. If we do the following changes: $\text{G} \to \text{B}, \text{o} \to \text{a}, \text{g} \to \text{n}, \text{l} \to \text{g}, \text{e} \to \text{s}$ which gives a Hamming Distance of $5$ but if you consider the number of positions where the characters are different, we get $6$.

My initial thought here is that we are only allowed to change one character at a time, so when we change an $\text{o}$ to an $\text{a}$, we've only changed one of the $\text{o's}$ and we need to do it again for the second $\text{o}$.

Best Answer

It is $6$ in that case. Even a better example would be Hamming distance between the binary words. There, we only have $1$'s and $0$'s so if the things were as you said, Hamming distance between two binary words would be at most $2$ ($0$'s to $1$'s and $1$'s to $0$'s). But clearly, this is not the case.

In order to avoid this kind of confusion, we can define Hamming distance between two strings with same length $S_1 = a_1a_2...a_n$ and $S_2 = b_1b_2...b_n$ as the size of the set of indices where characters differ. Mathematically, we can write

$$d(S_1,S_2) = |\{i|a_i \ne b_i\}|$$

Related Question