Size of permutation subset for a given hamming distance

combinatoricspermutations

I am looking for a way to compute the size of a subset of permutation with replacement, but I can't find the right formula. I'm working with RNA sequences (A, U, C and G) of fixed length and trying to determine the size of a specific subspace of permutation. Let's say the sequence has 4 bases (string length =4), starting from AAAA, there would be 255 other possible permutations. Using the Hamming distance (Hamming(AAAA,AAAU)==1, Hamming(AAAA,GGGG)==4…), I want to know how many permutations fall into each distance. I have done it manually for this trivial example and the numbers are 12, 54, 108 and 81 (for a total of 255) for distances 1 to 4 respectively. Is there anyway to compute the size of each subsets without having to generate/iterate through all possible permutations?

Best Answer

To determine how many sequences of length $n$ have a Hamming distance of $k$ to the original sequence, you can count them this way:

Firstly, choose the $k$ characters that will be modified among the $n$ characters of the sequence, that's $\binom{n}{k}$ possibilities.

Secondly, for each modified character, pick a variation, that's $3^k$ possibilities.

In total, there are $3^k\binom{n}{k}$ such sequences.

Related Question