How many distinct prime factors are there in the numbers between two primes

analytic-number-theoryasymptoticsnumber theoryprime numbers

Let $p$ and $q$ be two consecutive primes and $f(p)$ be the number of distinct prime factors of the product $(p+1)(p+2)\cdots (q-1)$. Thus $f(p)$ is a count of the number of distinct primes factors that make up a prime gap.

Question: What is asymptotic order of $\sum_{p \le x}f(p)$?

Experimental data for $p < 10^{10}$ suggests that this could be $\sim x\log \log x – x$.

Source code

import numpy
p = 2
i = 0
s = 0
target = 10^6
step = 10^6

while True:
    i = i + 1
    q = next_prime(p)
    r = p + 1
    x = prime_factors(r)
    r = r + 1
    while r < q:
        x = x + prime_factors(r)
        r = r + 1
    s = s + len(numpy.unique(x))
    if i > target:
        print i,s,(s/q).n()
        target = target + step
    p = q

Best Answer

I expect that $$\sum_{p \leqslant x} f(p) = x\log \log x - x\log \log \log x + O(x)\,, \tag{$\ast$}$$ but I don't see how that could be proved without knowing much stronger bounds on prime gaps than we currently do. Since $\log \log \log x$ grows very very slowly, this would not easily be distinguished from $x\log \log x - x$ empirically.

It is not difficult to show that $$\sum_{p \leqslant x} f(p) \leqslant x\log \log x - x\log \log \log x + C\frac{x}{\log \log x} \tag{1}$$ for a suitable constant $C$ using the known bounds for prime gaps. Proving lower bounds is harder.

To estimate the sum, let's "switch the order of summation". Instead of counting the number of primes having a multiple in each composite run (the composite numbers between two successive primes), for each prime count the number of consecutive runs starting at or below $x$ in which the prime has a multiple.

Things are easier to write down if we consider only the multiples $\leqslant x$. This doesn't make a difference for $(1)$, since by a result of Hoheisel subsequently improved by various people, the length of the last composite run to be considered is at most $x^{\theta}$ for some $\theta < 1$. By the trivial bound $\omega(n) \ll \log n$, ignoring the numbers $> x$ in that run introduces an $O(x^{\theta}\log x)$ error, comfortably smaller than the $O\bigl(\frac{x}{\log \log x}\bigr)$ term in $(1)$.

Then for each prime $p \leqslant x$, the number of composite runs in which it has a multiple that we count is bounded above on the one hand by $\pi(x)-1$ (since there are at most that many nonempty runs we consider), and on the other hand by $\bigl\lfloor \frac{x}{p}\bigr\rfloor - 1$ since $p$ has just that many multiples $\leqslant x$ excepting $p$ itself. Taking the first bound for small primes and the second one for larger ones, we obtain (for not too small $x$) \begin{align} \sum_{p \leqslant x} f(p) &\leqslant \sum_{p \leqslant \log x} \bigl(\pi(x)-1\bigr) + \sum_{\log x < p \leqslant x} \biggl(\biggl\lfloor \frac{x}{p}\biggr\rfloor - 1\biggr) + O\bigl(x^{\theta}\log x\bigr) \\ &\leqslant \pi(x)\pi(\log x) + x \sum_{\log x < p \leqslant x} \frac{1}{p} + O\bigl(x^{\theta}\log x\bigr) \\ &= x\biggl(\log \log x - \log \log \log x + O\biggl(\frac{1}{\log \log x}\biggr)\biggr) + \pi(x)\pi(\log x) + O\bigl(x^{\theta}\log x\bigr) \\ &= x\log \log x - \log \log \log x + O\biggl(\frac{x}{\log \log x}\biggr) \end{align} by Mertens's second theorem and the Chebyshev bounds. (And we can by these means find an explicit $C$ ifwe wish to do so.)

In order to discuss lower bounds for the sum, let $G(x)$ denote the largest prime gap for which the smaller prime doesn't exceed $x$. Then it is clear that for primes $p > G(x)$ the number of composite runs in which $p$ has a multiple is precisely the number of composite multiples of $p$ not exceeding $x$ (plus maybe one), since such a prime cannot have more than one multiple in a single run. Hence we have $$\sum_{p \leqslant x} f(p) \geqslant \sum_{G(x) < p \leqslant x} \biggl(\biggl\lfloor \frac{x}{p}\biggr\rfloor - 1\biggr) = x\log \log x - x \log \log G(x) + O\biggl(\frac{x}{\log G(x)}\biggr)\,.$$ If, as is widely believed, we have $G(x) \in O\bigl((\log x)^k\bigr)$ for some exponent $k$ (the case $k = 2$ is Cramér's conjecture), then $\log \log G(x) = \log \log \log x + O(1)$, and $(\ast)$ follows. If on the other hand $G(x)$ can be as large as $x^{\varepsilon}$ for some $\varepsilon > 0$, then the arguments above aren't even sufficient to establish the principal term $x\log \log x$.

Related Question