[Math] Find the sum of all primes smaller than a big number

computational complexitynumber theory

I need to write a program that calculates the sum of all primes smaller than a given number $N$
($10^{10} \leq N \leq 10^{14} $).

Obviously, the program should run in a reasonable time, so $O(N)$ is not good enough.
I think I should find the sum of all the composite numbers smaller than $N$ and subtract it from $1+2+…+N$, but I'm trying that for a long time with no progress.

Best Answer

Lucy_Hedgehog's post on Project Euler forum about problem 10 (visible after logging in):

Here is a solution that is more efficient than the sieve of Eratosthenes. It is derived from similar algorithms for counting primes. The advantage is that there is no need to find all the primes to find their sum.

The main idea is as follows: Let $S(v,m)$ be the sum of integers in the range $2..v$ that remain after sieving with all primes smaller or equal than m. That is $S(v,m)$ is the sum of integers up to $v$ that are either prime or the product of primes larger than $m$.

$S(v, p)$ is equal to $S(v, p-1)$ if $p$ is not prime or $v$ is smaller than $p \cdot p$.
Otherwise ($p$ prime, $p \cdot p \leq v$) $S(v,p)$ can be computed from $S(v,p-1)$ by finding the sum of integers that are removed while sieving with $p$. An integer is removed in this step if it is the product of $p$ with another integer that has no divisor smaller than $p$. This can be expressed as

$S(v,p)=S(v,p−1)−p(S(\frac{v}{p},p−1)−S(p−1,p−1)).$

Dynamic programming can be used to implement this. It is sufficient to compute $S(v,p)$ for all positive integers $v$ that are representable as $\lfloor \frac{n}{k} \rfloor$ for some integer $k$ and all $p \leq \sqrt{v}$.

Python:

def P10(n):
    r = int(n**0.5)
    assert r*r <= n and (r+1)**2 > n
    V = [n//i for i in range(1,r+1)]
    V += list(range(V[-1]-1,0,-1))
    S = {i:i*(i+1)//2-1 for i in V}
    for p in range(2,r+1):
        if S[p] > S[p-1]:  # p is prime
            sp = S[p-1]  # sum of primes smaller than p
            p2 = p*p
            for v in V:
                if v < p2: break
                S[v] -= p*(S[v//p] - sp)
    return S[n]

The complexity of this algorithm is about $O(n^{0.75})$ and needs 9 ms to find the solution. Computing the sum of primes up to different bounds $n$ gives:

n = $2 \cdot 10^7$: 12272577818052 0.04 s
n = $2 \cdot 10^8$: 1075207199997334 0.2 s
n = $2 \cdot 10^9$: 95673602693282040 1 s
n = $2 \cdot 10^{10}$: 8617752113620426559 6.2 s
n = $2 \cdot 10^{11}$: 783964147695858014236 34 s
n = $2 \cdot 10^{12}$: 71904055278788602481894 3 min

I also have a C++ version of this algorithm. This one solves the problem in 700μs. It needs 10 hours to compute the sum of primes up to $10^{17}$ which is 129408626276669278966252031311350.

It is also possible to improve the complexity of the algorithm to $O(n^{2/3})$, but the code would be more complex.

I've checked with pen and Python - according to me, it works as described:

>>> P10(2*10**10)
8617752113620426559 in 8 s

>>> P10(2*10**11)
783964147695858014236 in 45 s

>>> P10(2*10**12)
71904055278788602481894 in 3 min 19 s
Related Question