[Math] Intuition behind lack of cycles in the Collatz Conjecture

collatz conjecturenumber theory

The Collatz Conjecture concerns the function

$f(n) = \begin{cases}
n/2, & \text{if $n$ is even} \\
3n+1, & \text{if $n$ is odd}
\end{cases}$
.

The conjecture says that if you start with any natural number and repeatedly apply the function, you will eventually end up at 1, after which you will cycle forever between 1, 4, and 2.

To prove the conjecture, one would need to prove that

  1. No sequence generated in this fashion grows without bound.
  2. No sequence generated in this fashion falls into a cycle other than the aforementioned 1, 4, 2 cycle.

The first of these requirements seems very likely to be true, but I don't see why the second holds up so well (computers have verified an absence of other cycles for mindbogglingly large starting numbers). It seems to me that if you're bouncing around the integers in a relatively random pattern, every now and then you should end up back where you started.

Is there an intuitive explanation for why cycles are so rare?

Best Answer

Consider the "compacted" formula (as also used in the "Syracuse"-version of the Collatz), defining one "step" beginning on $a$ going to $b$ (both odd $\ge 1$): $$ b = {3a+1\over 2^A } \tag 1$$ where $A$ contains the number of halving-steps.


Now to have a (very) simple cycle of just one step we must have: $$ a = {3a+1\over 2^A } \tag 2$$ It is simple to show, there is no solution except when $a=1$ only by rearranging: $$ a = {3a+1\over 2^A } \\ 2^A = {3a+1\over a } $$ $$ 2^A = 3+{1\over a } \tag3\\ $$ and having the rhs a perfect power of $2$ requires $a=1$, thus allowing (only) the well known "trivial" cycle $1 \to 1$.


Now extending this to a two-step cycle we need to have $$\begin{array}{} b = {3a+1\over 2^A } & a = {3b+1\over 2^B }\\ \end{array} \tag 4$$ with some so far undetermined $A$ and $B$.
To see better the space of possible solutions we can take the product of the lhs's $a \cdot b$ and which must equal the product of the rhs's: $$\begin{array}{} a \cdot b = {3a+1\over 2^A } \cdot {3b+1\over 2^B } \end{array}$$ resulting in the required equality

$$\begin{array}{} 2^{A+B} = (3+{1\over a}) \cdot (3+{1\over b }) \end{array} \tag 5$$ This shows, we can only have a solution if the rhs becomes at least integer, which is difficult enough , but actually must even equal a perfect power of 2, larger than $3^2=9$ and namely must equal $16$. Now what values must $a,b$ have such that the rhs equals $16$? Both must contain $a=b=1$ and that is the already known trivial cycle, and no other solution in possible.


Now you see the pattern, how this generalizes to the disproof of the 3-step-cycle, 4-step-cycle, and so on for the N-step-cycle.

Unfortunately, for several $N$ the possibility for small $a,b,c,...$ exists "theoretically", which means, that the rhs-product can reach a perfect power of 2 by some $1 \lt (a_1 \ne a_2 \ne a_3 \cdots \ne a_N)$ - one might just try some $N$ by hand, remembering the conditions that all members of an assumed cycle should be greater than $1$, should be odd, should not be divisible by $3$ and that all involved $a_k$ should be different from each other, to get a better intuition.

Now to proceed more we introduce the knowledge, that by simple heuristics we already know, that $a_k \gt 1000$ (can be done manually with a programming language) or $a_k \gt 1 000 000 $ and even $a_k \gt 2^{60} $ (the latter by an extensive numerical search by de Oliviera and by Rosendaal).
If we introduce that knowledge and assume the minimal $a_1$ being, say $a_1=1+2^{60} $ , $a_2 = {3a_1+1\over 2^{A_1}} $ and so on , we find that for all manually reachable $N$ the rhs is disappointingly near to the value of $3^N$ and no perfect power of $2$ is only in any realistic distance from that. One can find, using the continued fraction of $\beta = \log(3)/\log(2)$, we need $N \gt 150 000 $ (or even much more don't have the actual value at hand).


Well, this shall just give an intuition as to why cycles are much unlikely.
If you like to do more, allow negative numbers for $a_1,a_2,...$ and see how and which solutions for small $N$ you can get.

Or proceed and compare the $5x+1$ problem with this: we find actually two or three possible cycles for small $N$ and $a_1 \lt 100 $ but after that the above method can be used to say much more about the nonexistence of certain $N$-step cycles. I've even found two cycles for the $181 x+1$-version having $N=2$ and small $a_1 \lt 100$ but after that, the formula comparing the rhs-product for $N$ steps to perfect powers of $2$ indicates the same "difficulty" for cycles to exist.


[Update] Just to have an example, how a (possibly infinite, don't know at the moment) set of $N$ can be ruled out for a solution to exist.
Assume for example $N=12$. So we have on the rhs 12 parentheses of the form $(3+1/a_k)$, whose product should equal $2^S$ (where I use in general the letter $S$ for $S=A_1+A_2+...A_{12}$, just the sum of exponents, which is also the number of even-steps). Then what is the next possible perfect power of 2 above $3^{12}$? we get, using $\beta=\log(3)/ \log(2)$: $S=\lceil N \cdot \beta \rceil = 20$,thus we have the condition $$2^{20} =(3+ {1\over a_1})(3+ {1\over a_2})...(3+ {1\over a_{12}}) \tag 6$$and the solution for smallest $1 \ne (a_1 \ne a_2 ... \ne a_{12})$ is $$2^{20} \overset?=(3+ {1\over 5})(3+ {1\over 7})(3+ {1\over 11})...(3+ {1\over 37}) \tag 7$$ Of course we could simply compute the values of the LHS and the RHS getting $$ 2^{20}= 1048576 \gt 697035.738776 $$ and because increasing the values for the $a_k$ would even decrease the value of the rhs there is no solution and thus no $N=12$-step cycle. But for the sake of generality we proceed differently.
Instead we do a rough estimate for the mean-value of the $a_k$. Assume all $a_k$ are equal to their "mean" $a_m$, then we can rewrite the equation $$ 2^S = (3+{1\over a_m})^N \\ 2^{S/N} = (3+{1\over a_m})\\ 2^{S/N} -3 = {1\over a_m}\\ {1 \over 2^{S/N} -3} = a_m \tag 7 $$ $$ a_m = {1 \over 2^{20/12}-3}\approx 5.72075494219... \tag 8 $$ and because $a_m$ is somehow a rough mean, some $a_k$ must be smaller and some must be larger. But there is only one possible value for any $a_k \lt a_m$ namely $a_1=5$. After that, rearranging one parenthese with that value assumed to the lhs in eq(6) we get $$ 2^S/(3+1/5) = (3+{1\over a_m})^{N-1} \\ 2^{20} \cdot 5/16 = (3+{1\over a_m})^{11} \\ a_m \approx 5.79638745091$$ and we find, that there is no way that $a_m$ can be a rough mean of the remaining $11$ $a_k$ since there is no more odd integer $a_k$ in $1 \ne 3 \ne 5 \lt a_k $ so a $N=12$-step-cycle cannot exist.
We see nicely, that for some $N$ we can exclude the possibility of a cycle just based on the basic assumptions on the form of the members of a possible cycle $a_k$ by the formula (6) and for such $N$ do not need to recurse to the $a_k \gt 2^{60}$ found by de Oliviera and Rosendaal.

However, and this leads to the observation that we need for the general solution of the cycle-problem some deeper thinking, there are some $N$ for which $2^S$ is comfortably near to $3^N$ so we can allow a set of small $a_k$ such that the rhs can approach the lhs. The continued fraction of $\beta$ gives us pairs of $N$ and $S$ where $2^S$ are especially near to $3^N$ and for which a cycle cannot be excluded by this method alone.


[update 2]
I've not done this before explicitely, but trying the continued fraction-convergents and filling into $a_k$ the set of consecutive smallest possible integers ($5,7,11,13,...)$ we get the following small table

N           S      lhs=2^S            rhs = (3+1/a_k)*()...    lhs/rhs        a_m
    1,      2,             4       ,          3.2        ,      1.25               1    ~ 2^0
    5,      8,           256       ,        292.571428571,      0.875             31.81 ~ 2^5
   41,     65, 3.68934881474 E   19, 5.44736223436 E   19,      0.677           1192.08 ~ 2^10
  306,    485, 9.98959536101 E  145, 5.57867460455 E  146,      0.179          99780.79 ~ 2^16
15601,  24727, 3.70427126979 E 7443, 1.06756786898 E 7444,      0.346      285817586.21 ~ 2^28
79335, 125743, 2.59863196329 E37852, 8.97264176433 E37852,      0.289     7216102492.69 ~ 2^33

and we see, that the RHS can reach the LHS for that specific $N$ and thus the assumption of the smallest possible values for the $a_k$ does not suffice to exclude the possibility of a cycle. The "mean" $a_m$, estimated by the geometric mean of all parentheses as proposed above, are in the last column; they increase with the size of $N$ and we get an impression as at which cyclelength $N$ this allows values of $a_1 >2^{60}$ and thus this method has its heuristical limit: the last entry in the last row means $a_m \approx 2^{33}$ and this means, that the knowledge that $a_1>2^{60}$ suffices to have disproved all cycles of steps $N \le 79335$.

But it might anyway be interesting to see, what explicitely smallest value for $a_1$ (which we can assume to be the smallest of the cycle) and the following sequence of consecutive possible members would suffice to have the LHS smaller than the RHS. That would surely be a nice exercise ...

Related Question