Major errors in an inclusion-exclusion application to counting twin primes in a certain interval.

elementary-number-theoryinclusion-exclusionprime numbersprogrammingtwin primes

Let $a,b,n,q$ all be odd numbers, and let $p_N$ be the $N$th odd prime, throughout.

Formula for the number $H(a,b,n) = \# \{a \lt x \lt b : x^2 – 1 = 0 \pmod n, x \text{ even}\}$. This is true since the pairs are in one-to-one correspondence with the averages of the two pair components.

Therefore, by inclusion-exclusion, we get the number $G(a,b) = \# \{ a \lt x \lt b : x^2 – 1 = 0 \pmod q$ for some $q \in Q\}$, where $Q$ is a finite set of primes. We have:

$$
G(a,b) = \sum_{u \ \mid \ \prod Q \\ u \neq 1} (-1)^{\Omega(u)}H(a,b,u)
$$

where $\Omega$ is prime Omega from number theory or $\Omega(p_1^{k_1} \cdots p_r^{k_r}) = \sum_{i = 1}^r k_i$. But in this case, since $Q$ consists of distinct primes we have $\Omega(u) = \omega(u)$ so you can use either.

And therefore, the number $F(a,b) = \# \{ a \lt x \lt b: x^2 – 1 \text{ is a unit } \pmod q, \forall q \in Q\}$ is given by:

$$
F(a,b) = \dfrac{b – a}{2} + G(a,b)
$$

Since for $a,b$ odd there are $\dfrac{b – a}{2}$ total pairs of consecutive odd numbers $o, o+2 \in [a,b]$.


Now, if we let $a = p_{N}, b = p_{N}^2$, then no prime $q\lt p_N$ divides the end points and so $H(p_N, p_{N+1}^2, u)$ has the surprisingly simple form (for $u$ comprised of primes $q \lt p_N$):

$$
H(a,b,u) = 2 \left( \lfloor\dfrac{b}{u}\rfloor – \lfloor\dfrac{b}{2u} \rfloor -\lfloor\dfrac{a-1}{u}\rfloor + \lfloor\dfrac{a – 1}{2u} \rfloor\right)
$$

Let $Q = $ the first $N-1$ odd primes. Then $F(p_N, p_N^2)$ should count the number of twin prime averages in the interval $(p_N, p_N^2)$. However, here is some code that performs this inclusion-exclusion formula:

#import matplotlib.pyplot as plt
from sympy import primepi, isprime, primerange, divisors, primeomega
from math import floor, sqrt

N = 1000

def H(a,b,u): 
    res = 2*(floor(b/u) - floor(b/(2*u)) - floor((a-1)/u) + floor((a-1)/(2*u)))
    return res

def inclusion_exclusion(F, Q):
    prodQ = 1
    for q in Q:
        prodQ *= q
    sum = 0
    for u in divisors(prodQ):
        if u != 1:
            sum += (-1)**primeomega(u) * F(u)
    return sum

primes = list(primerange(3, N))
    
for n in range(1, len(primes) + 1):
    Q = primes[:n]
    p_n = primes[n]
    formula0 = (p_n**2 - p_n)//2 + inclusion_exclusion(lambda u: H(p_n, p_n**2, u), Q)
    
    #formula = (p_N**2 - p_N) - (primepi(p_N**2) - primepi(sqrt(p_N - 1)) - 1)
    
    actual_count = 0
    for x in range(p_n, p_n**2):
        if isprime(x - 1) and isprime(x + 1):
            actual_count += 1
            
    print(formula0 - actual_count)

However, the code is printing a growing error value. It has to be all $0$'s for the formula to be correct.

I cannot figure out where my math is going wrong. I thought the code was simple enough so I've included it here.

So, I need some help with the math of it. Did I correctly apply inclusion-exclusion, for example.


Here is a fully worked example to help us out:

$$
a = 7, b = 7^2, Q = \{3,5\} \\
H(a,b,3) = 2([49/3] -[49/6] – [6/3] + [6/6]) = 14\\
H(a,b,5) = 2([49/5] -[49/10] – [6/5] +[6/10]) = 8 \\
H(a,b,15) = 2([49/15] – [49/30] – [6/15] + [6/30]) = 4 \\
G(a,b) = H(a,b,3) + H(a,b,5) – H(a,b,15) = 18 \\
F(a,b) = \dfrac{b – a}{2} + G(a,b) = 3 \\
$$

This formula is not working, as clearly the number should be $\#\{12,18,30,42\} = 4$. So it is off by $1$ in this case. It is close, but no cigar. I can't see where the problem is in my derivation.


I think a problem can be seen here:

$$
H(5,25,3) = 2([25/3] – [25/6] – [4/3] + [4/6]) = 2(8 – 4 – 1 + 0) = 6
$$

                       .......
      _____   _____          ________          ________    ........
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
  _____           _______          ________          _______
         ......                          ........

____'s count the odd pairs divided by $3$. …..'s count those that are not.

True, there are 6 ____'s between 5 and and 25, but 24 is not a twin prime average.

So the formula should be:

$$
H(a, b, u) = 2\left(\lfloor \dfrac{b}{u}\rfloor – \lfloor\dfrac{b}{2u}\rfloor – \lfloor \dfrac{a-1}{u}\rfloor + \lfloor\dfrac{a-1}{2u} \rfloor\right) + \begin{cases} 0 \text{ if }, p_N^2 = 1 \pmod u \\ 1 \text{ otherwise}\end{cases}
$$

whenever $a = p_N, b = p_N^2$, $u$ is comprised of the first $N-1$ odd primes.

Best Answer

I looked at the overcount for the case $a = 7, b = 49$ and it turns out that $H(a,b,15)$ doesn't count odd pairs $(25,27)$ nor $(33,35)$ which explains why the count is off by two when I run the up-to-date code.

The reason is because:

$$ (x^2 = 1 \pmod 3) \wedge (x^2 = 1 \pmod 5) \iff \\ (x = \pm 1 \pmod 3) \wedge(x = \pm 1 \pmod 5) \iff \\ (x^2 = 1 \pmod {15}) \\ \vee (x = -1 \pmod {3} \wedge x = 1 \pmod {5}) \\\vee (x = 1 \pmod {3} \wedge x = -1 \pmod {5}) $$

$H(a,b,15)$ only counts the first part, the parts where $x = -1 \pmod {3}$ and $x = 1 \pmod {5}$ or vise versa (swapping $3, 5$) are not counted and those are precisely the pairs $(25,27)$ and $(33,35)$.

The good news is that those $\vee$'s are exclusive or's so once we come up with a formula for the rest we can add/subtract it in. The bad news is I'm not sure how to count those, for general $q_1\cdots q_n$. It might be easier, knowing each $q_i$ is distinct.