The topological entropy of the Collatz map (extended to 2-adic integers)

arithmetic-dynamicscollatz conjecturedynamical systemsentropy

In the process of learning the basics of dynamical systems, I finished the chapter on topological entropy and decided, as an exercise, to try and compute the entropy of one of my favorite maps: $T : \Bbb{N} \to \Bbb{N}$ given by

\begin{equation*}
T(x) =
\left\{\begin{array}{ll}
\frac12 x & \text{if } x\equiv 0 \mod 2\\[0.3em]
\frac32 x + \frac12 & \text{if } x \equiv 1 \mod 2
\end{array}\right.
\end{equation*}

However, the definition of entropy from my book requires that the underlying space be compact, which $\Bbb{N}$ is not. But the $p$-adic integers are! And the choice for $p = 2$ is natural, since $T$ involves division by $2$

Let $\Bbb{Z}_2$ denote the $2$-adic integers, with each integer identified by its (possibly infinite) binary expansion

\begin{equation*}
\Bbb{Z}_2 \simeq \{0, 1\}^\infty = \{\ldots b_3b_2b_1b_0 : b_i \in \{0,1\} \; \forall i \geq 0\}
\end{equation*}

equipped with the usual norm and metric

\begin{gather*}
|x|_2 = 2^{-\ell}\;,\; \text{ where } \ell = \min \{i : b_i \neq 0 \}\\[1em]
d(x,y) = |x-y|_2
\end{gather*}

Then $T$ can be naturally extended to $\Bbb{Z}_2$, with addition, multiplication by $3$, and division by $2$ all understood as bitwise operations on these semi-infinite bitstrings. $T$ is continuous on $\Bbb{Z}_2$ (uniformly cts since $\Bbb{Z}_2$ is compact), and so we have a dynamical system $(\Bbb{Z}_2, T)$

The definition for entropy involves this family of metrics:

\begin{equation*}
d_n(x, y) = \max_{0\leq i < n} d\big(T^i(x), T^i(y)\big)
\end{equation*}

We compute $\operatorname{cov}(n, \varepsilon, T) = $ the cardinality of minimally $(n, \varepsilon)$-covering sets. A subset $A\subseteq \Bbb{Z}_2$ is $(n, \varepsilon)$-covering if $\forall y\in \Bbb{Z}_2\; \exists x\in A : d_n(x, y) < \varepsilon$. Finite covering sets exist $\forall n, \varepsilon$ by compactness, and $\operatorname{cov}$ is the minimum cardinality of any of them.

Then the entropy is

\begin{equation*}
h(T) = \lim_{\varepsilon\to0^+}\ \limsup_{n\to\infty}\; \frac1n \log\big( \operatorname{cov}(n,\varepsilon,T)\big)
\end{equation*}

I was able to obtain an upper estimate for $h(T)$ (proved below), but I can't see how to prove that the estimate is exact, or how to justify a smaller one.

Question: $h(T) \leq \log 2$. Is this estimate exact?


WLOG $\varepsilon \in \{\frac12, \frac14, \frac18, \ldots\}$, so that $d(x, y) < \varepsilon$ iff the binary expansions of $x$ and $y$ match in the first $N_\varepsilon = – \log_2 \varepsilon$ places.

Claim: Let $M_{n, \varepsilon} = n + N_\varepsilon$. Then $A = \{0, 1, 2, \ldots 2^M-1\}$ is $(n, \varepsilon)$-covering

The reason: For each $x \in \Bbb{Z}_2$, we can express $x = 2^Mq + r$ for a unique $r \in A$ by the usual division algorithm. Equivalently, $r$ corresponds to the binary string which lifts the first $M$ bits of $x$ and sets all larger bits to be $0$.

It is easy to see from this choice that the binary expansion of $r$ matches that of $x$ in the first $M > N_\varepsilon$ places, therefore $d(x,r) < \varepsilon$. Moreover, among the first $n$ iterates of $T$, we have the following relationship:

\begin{equation*}
T^i(x) = 3^{\#_{O,i}} 2^{M-i}q + T^i(r)
\end{equation*}

for $i = 0, 1, 2, \ldots M-1$. Here $\#_{O,i} = $ count of odd steps in the first $i$ steps of the trajectory of $x$ (the number of times $x$ was multiplied by $3$ among the first $i$ iterates of $T$).

This relationship follows from the fact that $T$ is an affine map which multiplies by $3$ for each odd step and divides by $2$ for each step, and the fact that the trajectory of $x$ is dominated by the parity of $r$ – ie, $T^i(x) \equiv T^i(r) \mod 2$ for every $i < M$ (since $x \equiv r \mod 2^M$)

Hence, we have the following distance estimate for each $i = 0, 1, 2, \ldots n-1$

\begin{align*}
d(T^i(x),T^i(r)) &= |T^i(x) – T^i(r)|_2 \\[0.5em]
&= |3^\#\cdot 2^{M-i} q|_2\\[0.5em]
&= |3^\#|_2\; |2^{M-i}|_2\; |q|_2\\[0.5em]
&\leq |2^{M-i}|_2\\[0.5em]
&=2^{i-M}
\end{align*}

And then $2^{i-M} < 2^{n-M} = 2^{-N_\varepsilon} = \varepsilon$, and so $d_n(x, r) < \varepsilon$ and the covering follows.

I suspect that this covering is not optimal, but at least we have $\operatorname{cov}(n, \varepsilon, T) \leq 2^{n+N_\varepsilon}$, in which case we obtain the entropy estimate

\begin{align*}
h(T) & \leq \lim_{\varepsilon\to0^+} \limsup_{n\to\infty} \frac1n \log 2^{n+N_\varepsilon}\\[0.5em]
&= \lim_{\varepsilon\to0^+} \limsup_{n\to\infty} (1+ \frac{N_\varepsilon}{n})\log 2\\[0.5em]
&= \lim_{\varepsilon\to0^+} \log 2
\\&= \log 2
\end{align*}

A subtlety: since we take the limit in $n$ first and $N_\varepsilon$ is independent of $n$, then $N_\varepsilon/n \to 0$ as $n\to\infty$

Best Answer

Okay, after thinking on it more, I think I can prove that $\log 2$ is exact.

Suppose $B\subseteq \Bbb{Z}_2$ is $(n, \varepsilon)$-covering, and WLOG $\varepsilon < \frac12$. Let $A = \{0, 1, 2,\ldots, 2^n-1\}$.

By $(n, \varepsilon)$-covering assumption, for each $r\in A$ there exists a $b \in B$ such that $d_n(r, b) < \varepsilon$. In particular, one can form a function $f:A\to B$ such that $d_n\big(r, f(r)\big) <\varepsilon$ for each $r\in A$.

Claim: $f$ is $1$-$1$. Were it the case, then $f(A) \subseteq B$ implies $2^n = \#A = \#f(A) \leq \#B$

(here $\#S$ denotes the cardinality of a finite set). It implies that $2^n \leq \#B$ for any $(n, \varepsilon)$-covering subset $B$ - in particular, $2^n \leq \operatorname{cov}(n, \varepsilon, T)$, and therefore

\begin{align*} \log 2 &= \lim_{\varepsilon\to0^+} \ \limsup_{n\to\infty}\ \frac1n \log 2^n \\[0.5em] &\leq \lim_{\varepsilon\to0^+} \ \limsup_{n\to\infty}\ \frac1n \log \operatorname{cov}(n, \varepsilon, T) \\[0.5em] &= h(T) \end{align*}


Proof of the claim: first note that clearly, for any distinct $r, s \in A$ we have $r\not\equiv s \mod 2^n$

So suppose to the contrary there exists a distinct pair $r, s$ such that $f(r) = f(s)$. Then we have

\begin{equation*} d_n(r, s) \leq d_n\big(r, f(r)\big) + d_n\big(s, f(s)\big) < 2\varepsilon < 1 \end{equation*}

Here we use the assumption that $\varepsilon < \frac12$. Then by the definition of $d_n$,

\begin{equation*} d\big(T^i(r), T^i(s)\big) < 1 \quad \forall i = 0, 1, 2, \ldots n-1 \end{equation*}

By the definition of the $2$-adic norm, we have

\begin{equation*} T^i(r)\equiv T^i(s)\mod 2 \quad \forall i = 0, 1, 2, \ldots n-1 \end{equation*}

An aside fact: the (shortened) Collatz map $T$ has the property that parity sequences of length $N$ are unique to residue classes modulo $2^N$.

That is, given any sequence $p_0p_1\ldots p_{N-1}\in\{0,1\}^N$, there exists a unique integer $x \in \{0, 1, 2, \ldots 2^N-1\}$ such that $T^i(x)\ \%\ 2 = p_i$ for each $i = 0,1 , 2, \ldots N-1$ (here $x\ \% \ 2$ is the canonical reduction of $x$ modulo $2$ into $\{0,1\}$)

Put another way, \begin{gather*} T^i(x)\equiv T^i(y)\mod 2 \quad \forall i = 0, 1, 2, \ldots n-1\\ \iff\\ x\equiv y \mod 2^n \end{gather*}

The proof is a simple inductive argument, following from the fact that $\frac32(2k+1) + \frac12 = 0$ and $\frac12(2k) = 0$ each have unique solutions for $k$ modulo $2$. The property extends to all $2$-adic integers, but since $r$ and $s$ are finite integers we can apply this fact without objection.

From this, the conclusion is $r\equiv s \mod 2^n$, which contradicts the fact that $r$ and $s$ are distinct elements of $A$. The lower estimate for $h(T)$ follows by the previous argument, and so therefore $h(T) = \log 2$

Related Question