Explaining Property of Second-to-Last Bits of Primes

nt.number-theoryprime numbers

Assuming that $i \geq 0$, let $p_i$ denote an $i$-th prime: $p_0 = 2, p_1 = 3, \ldots$ Then $b_i$ denotes the second-to-last bit of $p_i$, i.e. $b_i = \left\lfloor p_i/2 \right\rfloor \bmod 2.$

The sequences $B_0$ and $B_1$ are constructed by the following algorithm: if $b_i$ is equal to $i \bmod 2$, append $b_i$ to $B_0$; otherwise (i.e. if $b_i$ is not equal to $i \bmod 2$), append $b_i$ to $B_1$. Thus $$\begin{array}{l}
B_0 =
1010101101010101010101101010101010011001001111001101001011001010\ldots,\\
B_1 =
1101010101011001011010101100101010101010010101011101010101010101\ldots
\end{array}$$

Basically, both of these sequences are $\ldots1010\ldots$, slightly interspersed with runs of two or more identical bits. For example, the first $10^6$ bits of $B_1$ do not even contain ten consecutive zeros. What is the explanation of this phenomenon?

Best Answer

I think the phenomena you observe are explained more by the way you construct $B_0$ and $B_1$ than by the properties of primes (though properties of primes seem to play a role). If we suppose instead that $b_i$ were random, obtained by a coin flip, it's easy to see that for each bit of $B_0$ or $B_1$, the probability that the next bit is the same is $\frac{1}{3}$, and the probability that the next bit is different is $2/3$.

Indeed, say $b_i \equiv i \mod 2$ for some $i$, so at time $i$ we have just added $b_i$ to $B_0$.

Then if the next $k-1$ bits are not congruent to their position mod $2$, followed by the $k$th bit congruent to its position, $i+k$, mod $2$, we next append the bit $i+k \mod 2$ to $B_0$. This occurs with probability $\frac{1}{2^k}$, and in each sequence this occurs for exactly one value of $k$.

So we get a different bit with probability $\frac{1}{2} + \frac{1}{2^3} + \frac{1}{2^5} + \dots = \frac{2}{3}$, and the same bit with probability $\frac{1}{2^2} + \frac{1}{2^4} + \frac{1}{2^6} + \dots = \frac{1}{3}$. The $B_1$ case is identical.

So the probability of getting $10$ consecutive zeroes starting at any given point is $\frac{1}{2} \cdot \frac{1}{3^9}$. However, the length of time we expect to take before seeing the first run of $10$ consecutive zeroes is a little longer than the inverse of that, $3^{10}-1=59048$, by a classical argument.

So we waited about 17 times longer than one would expect to find a run of ten zeroes. (But you had 4 choices to pick to find such a long run without zeroes). That's not completely surprising, but it's a lot less strange than in the naive random model, where the expected wait time is $2^{11}-1$, so you would have waited about $489$ times longer than expected.


More generally, if we replace $b_i$ with a Markov chain with probability of taking the same value on adjacent steps $p$, then the probability that $B_0$ takes the same value on adjacent steps should be $$p^2 + p^2(1-p)^2 + p^2 (1-p)^4 + \dots = \frac{ p^2}{ 1- (1-p)^2} = \frac{p^2 }{ 2p -p^2} = \frac{p}{ 2-p}$$ since to get two of the same bit in $B_0$, $b_i$ must be same-same or same-different-different-same or same-different-different-different-different-same.

To explain $\frac{ p}{2-p} = \frac{1}{2} - .28 = .22$ as in dvitek's analysis, we need $p= \frac{2}{ 1/.22 +1} =.361$, i.e. there is a discrepancy of about $.14$ to explain in the more natural statistic of how often two adjacent primes have distinct second-to-last bits.

Related Question