[Math] Probability and the Collatz Problem

collatz conjecturenumber theory

A long time ago I was messing around with the Collatz problem and I stumbled across a "proof" that the iterations will converge. I was too embarrassed to show anyone, and I recently just remembered it. So I will put a sketch of it here and I want to know where it falls down.
Define $C(n)$ as one iteration of the Collatz function on a positive integer $n$.
Assume we start at a random positive even integer $n$.
Define the probability that $C(n)$ is even by $P(C(n)\in Even)$, or just $P(C(n))$.
For a random start number $P(C(n))=0.5$. (remember n is even)

But $P(C^{2}(n))=1/2\times P((C(n))+ (1-P(C(n)))$. (By simple probability tree)

And so on, in general $P(C^{k}(n))=1/2\times P(C^{k-1}(n))+(1-P(C^{k-1}(n)))$
$=1-1/2\times P(C^{k-1}(n))$
Which gives us a geometric progression (with multiple -1/2) as we work our way back into terms of $P(C(n))$.

As $k \to \infty $, $P(C^{k}(n))\to \frac{1}{1-(-1/2)}=2/3$.

So what? Well, this implies that as k gets bigger our iterations will settle down to something like an even,even,odd cycle (2/3 are even). Assume that on k iterations of the Collatz function, we end up at a larger number than our initial number. This implies (assuming WLOG we have an even number, x)
$3\times \frac{x}{4}+1$ is greater than $x$.
Or, $x<4$. But we have checked for all numbers less than four and the Collatz function converges, so there is no number that will diverge.

I know this is a very sketchy sketch of a "proof". I just want to know what you think. If it is nonsense please say so.

Edit: I also tried a different Collatz style iteration, with $3x-1$ instead of $3x+1$. As is observable this doesn't seem to converge. Using the same arguments as my original question, we can see why. $3\times \frac{x}{4}-1$ is greater than $x$ when $x<-4$.

I also tried similar other changes using $3x+a$ for $a\in {3,5,7,…}$. If the iterations converge for all integers less than $4\times a$ then it is convergent for all (assuming my "proof" makes sense)

Second Edit:
I want to reply to some issues raised. Bare in mind this is not a formal proof, it's just an idea of a way of making a proof.

On the objection that, just because on average the iterations will yield a 2/3 chance of an even number, there may in fact be sequences that do not reduce to an even-even-odd sequence:

If a particular sequence doesn't end in a 2/3 chance of an even number, and in fact it contains more odd numbers, then (now I'm not sure how to say this mathematically) that sequence is as dense in the integers as the sequences that cycle to 1-4-2, as for any even number in the sequence there is an infinite number of even numbers that also join this sequence (i.e. just double any even number in the sequence again and again…).

Therefore since the average sequence ends in a 2/3 chance of an even number after $k\to \infty$ iterations, there must be a sequence that has a higher than 2/3 chance of yielding an even number. I cannot prove it yet but I think, just on considering it, that such a sequence cannot exist, because:

  • as the sequence progresses the numbers would get smaller and smaller (e.g. if the prob of an even number is 3/4 then we can consider a even-even-even-odd sequence, which is the same as a even-even-odd sequence but with an extra division by two, so it would necessarily be smaller than the starting number. Meaning (a) it couldn't cycle (b) it couldn't diverge to infinity. i.e. it is impossible.

(BTW the above type of proof I'm trying to make is based on a type of proof I forget the name of – if the average of something is x, but you can find something with a value less than x, then there must exist an object with a value greater than x.)

On the objection that I am abusing the term probability and that this is akin to saying "there are no primes, because as we get to bigger and bigger numbers the probability of a certain number being prime approaches zero:

I think that, although I haven't defined how I'm using probability axioms, I think saying $C(n)$ can either be even or odd, and given an arbitrary $n$ it is possible to talk about the probability of it being even. If $C(n)$ is even, then $C(n+2)$ is odd so it doesn't matter where you are in the integers, the probability is always 1/2. I'm not sure why this is controversial.

Best Answer

One problem with your approach is that a probabilistic argument doesn't let you do much of anything with the statement about the behavior of all natural numbers as the Collatz rule is iterated. For example, there are infinite increasing sequences of natural numbers such that the probability that a natural number between $1$ and $n$ is among this sequence goes to $0$. But there is also another flaw in your approach, which is that you calculate $P(C^k(n))$ as if it only depended on $P(C^{k-1})$. According to your formula $$P(C^3(n)) = 1-1/2\times P(C^2(n)) = 1-1/2\times (1-1/2\times P(C(n))) = 1-1/2\times(1-1/4)=5/8$$ meaning that $P(C^3(n))$ does not depend on whether or not $C(n)$ is odd. This seems reasonable on the surface, as you would expect $\frac{3C(n)+1}{2}$ to be even just as often as it is odd (which is the assumption being made). However, this is not necessarily the case (I believe it to be false, and if it is true it certainly requires lots of justification), as the values produced by the Collatz iteration are not evenly distributed (for example, powers of two show up a lot), so we cannot use that this is true for evenly distributed $C(n)$ (note that since you are inducting, $C(n)$ may have gone through many Collatz iterations). If you are still not convinced, there is another stumbling block due to a property of the distribution of prime numbers called Chebyshev's bias. $\frac{3C(n)+1}{2}\cong 0\mod 2$ iff $C(n)\cong 1\mod 4$, which requires that $C(n)$ have an even number of prime factors congruent to $3\mod 4$. But Chebyshev's bias indicates that more of its factors are probably congruent to $3\mod 4$ than to $1\mod 4$, meaning we would expect the probability that $\frac{3C(n)+1}{2}\cong 0\mod 2$ not to conform nicely to $1/2$ (and it wouldn't stick to one side either, but fluctuate) even if we appeal to a distribution based on factors rather than size. This is but one of the many examples of complications in the distribution of primes and factors that make any inductive statement like yours impossible to make. The sequence $C(n)\mod 2,C^2(n)\mod 2,\ldots$ simply gives too much information about the upcoming terms $\mod 2$ to be ignored, yet the information is too subtle at present for us to account for.

Related Question