Sum of Primes – How Soon Can We Represent a Number as the Sum of Two Primes?

analytic-number-theoryasymptoticselementary-number-theorynumber theoryprime numbers

Update 27-Apr-2022: Posted in MO

Goldbach's conjecture says that every even number can be represented as the sum of two primes. But how soon can we find such a representation. Taking $20 = 3 + 17 = 7 + 13$ if we start searching from the smallest prime we will encounter the pair $(3,17)$ first and say that $20$ satisfied Goldbach's conjecture. We do not have to go all the way to the pair $(7,13)$.

Definition 1: The smallest Goldbach representation of an even number $n$ is defined as the pair $(p_n,n-p_n)$ where $p_n$ is the smallest prime such that $n-p_n$ is also a prime.

I found that $p_n$ is actually much smaller than $n$. Experimental data for $n \le 230751000000 $ suggests an asymptotic relation of the form

$$
\frac{1}{n}\sum_{k \le n} p_k \sim \log n + c
$$

that for $c < 1.3054$ and $c$ is decreasing as $n$ increases.

So given $n$, we can ask how large can $p_n$. For even $n \le 3.6 \times 10^{10}$, the largest value of $p_n$ occurs as the following values of $n$ as shown below by maximal $(n, p_n)$ pairs.

(n,p_n)
(4,2)
(6,3)
(30,7)
(98,19)
(220,23)
(308,31)
(556,47)
(992,73)
(2642,103)
(5372,139)
(7426,173)
(43532,211)
(54244,233)
(63274,293)
(113672,313)
(128168,331)
(194428,359)
(194470,383)
(413572,389)
(503222,523)
(1077422,601)
(3526958,727)
(3807404,751)
(10759922,829)
(24106882,929)
(27789878,997)
(37998938,1039)
(60119912,1093)
(113632822,1163)
(187852862,1321)
(335070838,1427)
(419911924,1583)
(721013438,1789)
(1847133842,1861)
(7473202036,1877)
(11001080372,1879)
(12703943222,2029)
(21248558888,2089)
(35884080836,2803)

Question: Assuming Goldbach's conjecture or otherwise, is there any heuristic upper bound on $p_n$ in terms of $n$? Trivially Goldbach's conjecture says $p_n \le n/2$ but can we do better than this?

Best Answer

You could use a heuristic just based on the number of odd primes: The chance that a large odd n is prime is roughly 2 / ln n. You can estimate the chances that k consecutive large integers are all primes. You can then calculate for example “if I check all even integers from 10^(j-1)to 10^j, and given k, what are the chances that there is any n in that range so that n - p is non-prime for each of the first k primes”.

You get a bit more precision if you take into account small divisors of n. If n is a multiple of 3, for example, then n-3 is composite instead of a 2/ln n chance of being prime. But for all other p, n-p is not divisible by 3, making it a lot more likely that n-p is prime (3 / ln n instead of 2 / ln n).

On the other hand, if n is not a multiple of 3, then n-p for p≠3 is more likely to be divisible by 3 then a random large integer. So n which are not divisible by 3, 5, 7 are more likely to produce a longer sequence of primes with n-p composite. That will make your first heuristic underestimate the maximum. If you look at your numbers, none above 30 are divisible by 3, and only one after 220 is divisible by 5.

If n is not divisible by some (small) prime q, and we take a random prime p > 2, p ≠ q, then there are q-1 possible values n modulo q, and q-1 possible values p modulo q. n-p is divisible by q in one in q-1 cases instead of one in q cases if n-p was a random integer.

The probability that n-p is prime for large even n, odd p, would usually be 2 / ln n. (1 / ln n for random integers, twice as high because n-p is odd). If n is divisible by q, then the chance that n-p is NOT divisible by q is (q-2) / (q-1) instead of (q-1) / q for random numbers, so the chance that n-p is prime is multiplied by (q^2 - 2q) / (q-1)^2 = 1 - 1 / (q-1)^2. This factor is 3/4 for q = 3, 15/16 for q = 5, 35/36 for q = 7, 120/121 for q = 11. The product of these factors for many q is about 0.66. So the chance that n-p is prime is not 2 / ln n but about 1.32 / ln n. On the other hand, there are fewer n which are not divisible by 3, 5, 7, 11 etc.

If the chance of n-p being prime is 1.32 / ln n, the chance of n-p being composite = 1 - 1.32 / ln n, then the chance that n-p is composite for k different p is about (1 - 1.32 / ln n)^k ≈ exp(-1.32k / ln n), the chance that one p is the matching prime for n is ≈ 1 - exp (1.32k / ln n). If we check M values for n, the chance that we always find a matching prime among the first k primes is (1 - exp (1.32k / ln n))^M. For large k, this is about exp (-M * exp (1.32k / ln n)). (Haven't tested these formulas, so there might be mistakes).

If you decide on say M = n / 2 ln n, which is roughly the number of primes from n/2 to n, and decide you want a probability of say 0.5, then you can calculate k easily. For example: ln 0.5 = -M exp (1.32k / ln n), ln 2 * n / 2 ln n = exp (1.32k / ln n), 1.32 k / ln n = ln (ln 2 * n / 2 ln 2n), k = ln n * ln (n ln 2 / 2 ln 2n) / 1.32. Very roughly.

And that's the number of primes. The k'th prime is very roughly k ln k. So the prime required would be about c * (ln n)^2 * ln ln n and c = 1.2 gives quite reasonable results (the $p_n$ are not a very smooth function).

Related Question