How many words of length $n$ can be formed which do not contain $k$ consecutive repeated characters

combinatoricscombinatorics-on-wordspermutations

I am struggling with a question regarding counting number of possible words of length n which do not contain k consecutive equal characters, given that $2 \leq k \leq n$. The words can be formed only using lowercase English letters. So, if we had n = 3, k = 2, the number of possible words is $26^3 – 26 – 2\times26\times25 = 16250$. This corresponds to the case where no consecutive letters are the same. For n = 3, k = 3, we want to count the words of type: "…" (no 2 consecutive letters are the same), "aa_", "_aa". The number of ways to do this is $26^3 – 26 = 17550$.

My approach so far has been to use the inclusion-exclusion principle by counting the exact repetitions for k, k+1, k+2,…n consecutive repetitions and subtract that from $26^n$, but I am probably missing some cases. The formula I used to find number of words with k consecutively same characters is $(n – k + 1)\cdot 26 \times 25^{n – k} + \sum_{c=2}^{\lfloor n/k \rfloor}$ $n – ck + r – 1 \choose r – 1$ $26 \times 25^{n – ck + c – 1}$, where $\lfloor a \rfloor$ represents the floor of $a$ and $r = c + 1$. This simplifies to: $(n – k + 1)\cdot 26 \times 25^{n – k} + \sum_{c=2}^{\lfloor n/k \rfloor}$ $n – c(k -1) \choose c$ $26 \times 25^{n – c(k -1) – 1}$. So, my final answer is:

$26^n – \sum_{k = k}^{n} \Bigl[(n – k + 1)\cdot 26 \times 25^{n – k} + \sum_{c=2}^{\lfloor n/k \rfloor}$ $n – c(k -1) \choose c$ $26 \times 25^{n – c(k -1) – 1}\Bigr]$.

This does not give the correct answer in all cases however. For smaller values of $n, k$, this seems to be correct but when $n, k$ are larger (in the 100s), I am getting the wrong answer. I can see one obvious case I am missing, which is when there are repetitions of different groups of numbers, for instance if there are 3 consecutive a and 4 consecutive b in the word. My method only counts the cases where there are exactly 3 consecutively same letters and exactly 4 consecutively same letters, but not cases where both are present. I don't know how to tackle this case though. Or am I approaching this problem completely wrong and are there other simpler ways to solve this?

Note: This problem can be solved programmatically as well, so if you can do this in code that works too! This is my code in Python for my approach described above, returning the answer as modulo($10^9 + 7$):

def calculate_exact_reps(n, k):
    res = (n - k + 1) * 26 * 25 ** (n - k)
    for c in range(2, n//k):
        res += math.comb(n - c * (k - 1), c) * 26 * 25 ** (n - c * k + c - 1)

    return res

def countValid(n, k):
    MOD = 10**9 + 7
    if k == n:
        return (26**k - 26) % MOD
    if k == 2:
        return (26 * 25**(n - 1)) % MOD
    res = 0

    # inclusion-exclusion
    res = 26 ** n - 26
    for i in range(k, n):
        res -= calculate_exact_reps(n, i)

Best Answer

Although Inclusion-Exclusion generalizes well, it is not always the best approach to such a problem. In this problem, both Inclusion-Exclusion and the Direct approach result in messy Math. From my limited knowledge, that leaves recursion.

Let $f(n)$ denote the number of satisfying words (no instance of $k$ consecutive letters), as a function of $(n)$. Index the letters, from left to right as $L_1, L_2, \cdots, L_n.$ This means that the rightmost letter is $L_n$ and the leftmost letter is $L_1$.

For $r \in \{1,2,\cdots,k-1\},$ let $f(n,r)$ denote the number of satisfying words that end with $r$ consecutive letters. That is:

Let $f(n,1)$ denote the number of satisfying words,
where $L_{n-1} \neq L_n$.

Let $f(n,2)$ denote the number of satisfying words,
where $L_{n-1} = L_n$ and $L_{n-2} \neq L_{n-1}.$

Let $f(n,3)$ denote the number of satisfying words,
where $L_{n-2} = L_{n-1} = L_n$ and $L_{n-3} \neq L_{n-2}.$

Let $f(n,4)$ denote the number of satisfying words,
where $L_{n-3} = L_{n-2} = L_{n-1} = L_n$ and $L_{n-4} \neq L_{n-3}.$

$\cdots$

Let $f(n,k-1)$ denote the number of satisfying words, where $L_{n-(k-2)} = L_{n-(k - 3)} = \cdots = L_{n-2} = L_{n-1} = L_n$ and $L_{n-(k-1)} \neq L_{n-(k-2)}.$


It is to be understood that for $n,r,k \in \Bbb{Z^+},$ such that $n < r \leq k-1,$ that $f(n,r) = 0.$ That is, when $n$ is less than $k-1$, you can't have a word with $n$ letters, that ends with more that $n$ consecutive letters.

Then, you have the following formulas that illustrate how $f(n)$ may be recursively computed:

  • $\displaystyle f(n) = \sum_{r=1}^{k-1} f(n,r).$

  • $f(n+1,1) = f(n) \times 25.$
    That is, you have $25$ choices for constructing a satisfying word with $n+1$ characters by appending a character to the end of any satisfying word with $n$ characters, such that $L_n \neq L_{n+1}.$

  • For $2 \leq r \leq k-1$,
    $f(n+1,r) = f(n,r-1)$.
    That is, there is only one way of converting a $n$ character sequence that ends with $(r-1)$ consecutive letters into a $(n+1)$ character sequence that ends with $(r)$ consecutive letters.


The following table, with $k = 4$ illustrates how $f(n)$ may be recursively computed for $n \in \{1,2,3,4,5\}.$

\begin{array}{| r | r | r | r | r |} \hline n & f(n,1) & f(n,2) & f(n,3) & f(n) \\ \hline 1 & 26 & 0 & 0 & 26 \\ \hline 2 & 26 \times 25 & 26 & 0 & (26)^2 \\ \hline 3 & (26)^2 \times 25 & 26 \times 25 & 26 & (26)^3 \\ \hline 4 & (26)^3 \times 25 & (26)^2 \times 25 & 26 \times 25 & \displaystyle 25 \times \left[\sum_{a=1}^3 (26)^a\right] \\ \hline 5 & \displaystyle (25)^2 \times \left[\sum_{a=1}^3 (26)^a\right] & (26)^3 \times 25 & (26)^2 \times 25 & \displaystyle (25)^2 \times \left[\sum_{a=1}^3 (26)^a\right]\\ & & & & + \left[(26)^3 \times 25\right] \\ & & & & + \left[(26)^2 \times 25\right]\\ \hline \end{array}