Does there exist infinite words using the alphabet $\{A,B,C,D\}$ that avoids patterns $XX,\ XAX,\ XBX,\ XCX,\ XDX$

combinatorics-on-wordssequences-and-series

Another form of this question is: Does there exist a gap-1 square-free infinite word using the alphabet {A,B,C,D}?

Normally square-free in this context means that there are no sub-words twice in a row that follows the pattern XX. For example the words AA, CABABC, ABCABC all aren't square-free. Words that are gap-1 square-free not only avoids sub-words twice in a row but also avoids identical sub-words that are one letter apart for example ABA DCABCAD ABCDABC are all square-free but not gap-1 square-free. All of the patterns that gap-1 square free avoids are XX, XAX, XBX, XCX, XDX.

It can be shown by exhaustion that there are finitely many words using the alphabet {A,B,C} That are gap-1 square-free. This is the entire list: A, AB, ABC, ABCA, AC, ACB, ACBA, B, BA, BAC, BACB, BC, BCA, BCAB, C, CA, CAB, CABC, CB, CBA, CBAC

A reduction of the problem can be done by first establishing a correspondence between words using the alphabet {A,B,C,D} and four-vertex directed graphs. Where the four vertices are labeled A,B,C,D with $N-1$ edges. Where $N$ is the number of letters in the word that the graph corresponds to. The tail of first edge in the graph starts with the vertex whose label is the same as the first letter in the Word. The head of the first edge ends with the vertex whose label is the same as the second letter in the word. In general the tail of
the $k^{th}$ edge corresponds to $k^{th}$ letter of the word and the head of the $k^{th}$ edge corresponds to the $k+1^{th}$ letter. Where $1\leq k\leq N-1$. For example the word ABC would correspond to the four-vertex two-edge graph. Where the first edge has tail on vertex A and head on vertex B. The second edge has tail on vertex B and head on vertex C. All of these corresponding graphs cannot have loops because all words are gap-1 square-free. The tail of the second edge must start at the head of the first edge. The head of the second edge cannot end at the tail of the first edge because the words are gap-1 square free. With these conditions on the graphs there is only one non-isomorphic graph with two edges. This implies that the set of all words that are length three or greater that do not start with ABC are isomorphic to the set of words that start with ABC. So if there are no infinite words it is sufficient to show that no words that start with ABC are infinite.

here are a couple of links that might be useful:

Does there exist an infinite number string without any 'refrain'?

https://en.wikipedia.org/wiki/Square-free_word

EDIT: I wrote a computer program to find long words that are gap-1 square-free using the A,B,C,D alphabet. There exists words of this type that are more than $10^6$ letters long.

EDIT $2$: There exists an infinite gap-1 square-free word using the alphabet {A,B,C,D,E,F}. Proof:

Let $I$ be an infinite square-free word using the alphabet {A,B,C}. (There are examples of these in the links I have already provided.) Let $I^*$ be defined follows: $I^*$ is obtained by replacing every instance of A in $I$ with AD every instance of B with BE and every instance of C with CF. I will call the AD, BE, CF pairs "blocks". Let $S_1$ be the of letters {A,B,C} and let $S_2$ be the set of letters {D,E,F}. Let $x_1$ and $x_2$ be sub-words of $I^*$ that are of the same length and are adjacent or are one letter apart. When comparing the letters of the two words to see if they are the same there are two cases. In the first case the blocks of $x_1$ and $x_2$ could be out of alignment. For example if $x_1$ is ADB and $x_2$ is ECF. The first two letters of $x_1$ is in one block and the last letter is in a second block where as the first letter of $x_2$ is in one block and the last two letters are in a second block. If the blocks are out of alignment none of the letters between $x_1$ and $x_2$ will match each other because the letters will be in opposing sets. (If the first letter of $x_1$ is in $S_1$ then the first letter of $x_2$ will be in $S_2$, and so on.) In the second case (if the blocks are aligned), The first letter of a block in $x_1$ will match the first letter of the corresponding block in $x_2$ if and only if the second letter of the same block in $x_1$ will match the second letter of the corresponding block in $x_2$. So $x_1$ will match $x_2$ if and only if the blocks of $x_1$ match the blocks of $x_2$. In case 2 the blocks of $x_1$ must be adjacent to the blocks of $x_2$. So it is enough to show that the blocks of $I^*$ are square-free. There is a direct correspondence between the blocks of $I^*$ and the letters of $I$. $I$ is by definition square-free. QED

Best Answer

Okay, I apologize for some of my previous sloppy attempts. This time I’m pretty sure my answer is correct.

In this related post, it is asked whether there exists an infinite string of three characters $1,2,3$ that avoids patterns of the form $XX$. There do exist such sequences, and the top-voted answer shows how to construct such a sequence

$$1,2,3,1,3,2,1,2,3,2,1,3,1,2,3,1,3,2,...$$

by repeatedly applying the transformation $1\mapsto 123$, $2\mapsto 13$, and $3\mapsto 2$. Let’s call this sequence $\mathcal{S}$.

Now we’re going to apply a complicated transformation to $\mathcal{S}$ to construct a sequence satisfying your constraints. Define some subsequences as follows:

$$U_1=ABCDB,\space V_1=ABDCB$$ $$U_2=ACDBC,\space V_2=ACBDC$$ $$U_3=ADBCD,\space V_3=ADCBD$$

Now take the sequence $\mathcal{S}$ and proceed as follows. Replace the odd-numbered terms (starting with the first term) with $U_i$, where $i$ is the actual term of the sequence $\mathcal{S}$, and replace the even-numbered terms with $V_i$. You will get something looking like

$$U_1 V_2 U_3 V_1 U_3 V_2 U_1 V_2 U_3 V_2 U_1 V_3 ...$$

Now replace $U_i$ and $V_i$ with their corresponding string of $A,B,C,$ and $D$ as defined above. This gives us the string

$$ABCDB \space ACBDC \space ADBCD \space ABDCB \space ADBCD \space ACBDC \space ABCDB\space ...$$

Call this string $\mathcal{S}^{*}$.

We can check case-by-case that none of the simple concatenations $U_i V_j$ or $V_i U_j$ for $i\ne j$ results in a “forbidden pattern” of the form $XX$, $XAX$, $XBX$, $XCX$, or $XDX$. Also, the second and fifth characters of each five-block chunk of this sequences uniquely determine which character ($1$,$2$, or $3$) that chunk corresponds to in $\mathcal{S}$. In other words, either of $\mathcal{S}^{*}[5n+1]$ and $\mathcal{S}^{*}[5n+4]$ uniquely determines the value of $\mathcal{S}[n]$, meaning that a pattern of the form $XX$, $XAX$, $XBX$, $XCX$, or $XDX$ would imply the existence of a pattern of the form $XX$ in $\mathcal{S}$, which is impossible.