Explaining Property of Second-to-Last Bits of Primes

nt.number-theoryprime numbers

Assuming that $i \geq 0$, let $p_i$ denote an $i$-th prime: $p_0 = 2, p_1 = 3, \ldots$ Then $b_i$ denotes the second-to-last bit of $p_i$, i.e. $b_i = \left\lfloor p_i/2 \right\rfloor \bmod 2.$

The sequences $B_0$ and $B_1$ are constructed by the following algorithm: if $b_i$ is equal to $i \bmod 2$, append $b_i$ to $B_0$; otherwise (i.e. if $b_i$ is not equal to $i \bmod 2$), append $b_i$ to $B_1$. Thus $$\begin{array}{l}
B_0 =
1010101101010101010101101010101010011001001111001101001011001010\ldots,\\
B_1 =
1101010101011001011010101100101010101010010101011101010101010101\ldots
\end{array}$$

Basically, both of these sequences are $\ldots1010\ldots$, slightly interspersed with runs of two or more identical bits. For example, the first $10^6$ bits of $B_1$ do not even contain ten consecutive zeros. What is the explanation of this phenomenon?

Best Answer

I think the phenomena you observe are explained more by the way you construct $B_0$ and $B_1$ than by the properties of primes (though properties of primes seem to play a role). If we suppose instead that $b_i$ were random, obtained by a coin flip, it's easy to see that for each bit of $B_0$ or $B_1$, the probability that the next bit is the same is $\frac{1}{3}$, and the probability that the next bit is different is $2/3$.

Indeed, say $b_i \equiv i \mod 2$ for some $i$, so at time $i$ we have just added $b_i$ to $B_0$.

Then if the next $k-1$ bits are not congruent to their position mod $2$, followed by the $k$th bit congruent to its position, $i+k$, mod $2$, we next append the bit $i+k \mod 2$ to $B_0$. This occurs with probability $\frac{1}{2^k}$, and in each sequence this occurs for exactly one value of $k$.

So we get a different bit with probability $\frac{1}{2} + \frac{1}{2^3} + \frac{1}{2^5} + \dots = \frac{2}{3}$, and the same bit with probability $\frac{1}{2^2} + \frac{1}{2^4} + \frac{1}{2^6} + \dots = \frac{1}{3}$. The $B_1$ case is identical.

So the probability of getting $10$ consecutive zeroes starting at any given point is $\frac{1}{2} \cdot \frac{1}{3^9}$. However, the length of time we expect to take before seeing the first run of $10$ consecutive zeroes is a little longer than the inverse of that, $3^{10}-1=59048$, by a classical argument.

So we waited about 17 times longer than one would expect to find a run of ten zeroes. (But you had 4 choices to pick to find such a long run without zeroes). That's not completely surprising, but it's a lot less strange than in the naive random model, where the expected wait time is $2^{11}-1$, so you would have waited about $489$ times longer than expected.

More generally, if we replace $b_i$ with a Markov chain with probability of taking the same value on adjacent steps $p$, then the probability that $B_0$ takes the same value on adjacent steps should be $$p^2 + p^2(1-p)^2 + p^2 (1-p)^4 + \dots = \frac{ p^2}{ 1- (1-p)^2} = \frac{p^2 }{ 2p -p^2} = \frac{p}{ 2-p}$$ since to get two of the same bit in $B_0$, $b_i$ must be same-same or same-different-different-same or same-different-different-different-different-same.

To explain $\frac{ p}{2-p} = \frac{1}{2} - .28 = .22$ as in dvitek's analysis, we need $p= \frac{2}{ 1/.22 +1} =.361$, i.e. there is a discrepancy of about $.14$ to explain in the more natural statistic of how often two adjacent primes have distinct second-to-last bits.

Related Solutions

[Math] Relatively concise English expositions of the proofs of the various Weil conjectures

What about Nick Katz' expose:

Nicholas M. Katz, MR 1831948 $L$-functions and monodromy: four lectures on Weil II, Adv. Math. 160 (2001), no. 1, 81--132.

As well as Kowalski's notes.

Unexpected Behavior – Involving ?2 and Parity

A list of predecessors as mentioned in my comment.

I document pairs of $(m,n)$ for consecutive $m$ and their 1-step predecessors $n$ such that $f(n)=m$. The value $n=0$ indicates, that $m$ has no predecessor. I didn't reflect, that one $m$ can have two predecessors, but if $n/2$ is odd, then $n/2$ is a second predecessor.(This makes the table more interesting, because all odd predecessors $n$ are overwritten by the even predecessors $2n$...

Moreover, a nearly periodic structure occurs. I tried to resemble this by the arrangement of three or four columns of $(m,n)$ such that the first column contains all $m$ which have no predecessor. The basic pattern is not really periodic, but has super-patterns which again seem to be periodic but actually aren't. This pattern-superpattern-structure is also recursive. It reminds me of a similar structure when I looked at $\beta=\log_2(3)$ and found a similar style of pattern-superpattern-supersuperpattern-... and is there related to the continued fraction of $\beta$.
So I think we'll get no nice description for the cases $m$ which have no predecessor...

 m     n         m     n         m     n         m    n
------------------------------------------------------------- 
                 1     2         2     4    
 3     0         4     6         5     8    
 6     0         7    10         8    12         9    14    

10     0        11    16        12    18    
13     0        14    20        15    22        16    24    

17     0        18    26        19    28    
20     0        21    30        22    32    
23     0        24    34        25    36        26    38    

27     0        28    40        29    42    
30     0        31    44        32    46        33    48    

34     0        35    50        36    52    
37     0        38    54        39    56    
40     0        41    58        42    60        43    62    

44     0        45    64        46    66    
47     0        48    68        49    70        50    72    

51     0        52    74        53    76    
54     0        55    78        56    80        57    82    

58     0        59    84        60    86    
61     0        62    88        63    90    
64     0        65    92        66    94        67    96

Update Some more explanation on the idea of "recursive aperiodic pattern". If we list the values $m$ which have no predecessor, we get

m_k:   3, 6,10,13, 17,20,23,27,30,...

Writing the differences (I have prepended a zero-value to the above list of $m_k$)

    ,3,3,4  ,3,4  ,3,3,4  ,3,4  ,3,3,4  ,3,4  ,3,4  ,3,3,4 , ...

We note, that we have a pattern of two different words: 3,3,4 and 3,4 repeating, but aperiodical. Let's denote the longer one with the capital A and the shorter one with the small a (and A means a difference of 10 and a of 7).
We get

 Aa Aa Aaa 
 Aa Aaa 
 Aa Aa Aaa 
 Aa Aaa
 Aa Aa Aaa
 Aa Aaa
 Aa ...

Again we find only two kind of "words". Let's them shorten by Aaa=B and Aa=b. B means now a difference of 24, b of 17. Then we get

   bbB bB
   bbB bB
   bbB bB bB
   bbB bB
   bbB bB bB
   bbB bB
   bbB bB
   bbB bB bB
   ...

Next obvious step gives

   Cc Cc Ccc
   Cc Ccc
   Cc Cc Ccc
   Cc Ccc
   Cc Cc Ccc
   Cc Ccc
   ...

with c representing a difference of 17+24=41 and C of 17+17+24=58.
And so on.
If I recall correctly, then with the mentioned case of working with $\beta = \log_2(3)$ the same style of recursive pattern reflected the convergents of the continued fractions of $\beta$.
The first few differences here match the convergents of the continued fraction of $\sqrt2$ so far:

            a    b    c                    short patterns
 -------------------------------------
[1  1  3    7    17   41  99   239   577  ...  ]  convergents of contfrac(sqrt(2))
[0  1  2    5    12   29  70   169   408  ...  ] 
 -------------------------------------...
          A/2   B/2   C/2                  long patterns

Update 2 The above can be explained by the following:

a number of the form $\lfloor2k\sqrt2\rfloor$ has exactly one predecessor $4k$;
a number of the form $\lfloor(2k-1)\sqrt2\rfloor$ has exactly two predecessors $2k-1$ and $4k-2$;
a number has no predecessors iff it has form $\lfloor n(2+\sqrt2)\rfloor$.

The first two statements are easily checked, while the third follows from the Beatty theorem, as explained in another answer by @Dattier

Update 3 Using a back-step algorithm (recursive) it seems I've got the predecessing tree of $m=73$. If no bugs, then this tree would also be complete. (But my routine may still be buggy, please check the results!)

The back-steps go from top-right south-west (antidiagonal) downwards. When there are two possible predecessors, they occur in the same column, but on separate rows.
If there is a predecessor without further predecessor, a short line (---) is printed.

                                    73   <--- start
                                104   
                            148   
                        105 ---

                        210   
                    149   
                212   
            300 ---

                    298   
                211 ---

                422   
            299   
        424   
    600 ---

            598   
        423 ---

        846 ---
    ---------------------------- tree seems to be complete (please check for errors!)

Best Answer

Related Solutions

[Math] Relatively concise English expositions of the proofs of the various Weil conjectures

Unexpected Behavior – Involving ?2 and Parity

Related Question