[Math] Heuristic argument for the prime number theorem

analytic-number-theoryprime numbers

Here is a bad heuristic argument for the prime number theorem. Let n be a positive integer and assume that PNT holds up to n. Then n itself is prime if and only if for each prime p<n the event p|n fails to hold. Assuming that all these events are independent, the probability that n is prime should be around $\prod_{p<n}(1-1/p)$. Minus the logarithm of this is approximately $\sum_{p<n}1/p$. Since the rth prime is about $r\log r$ (by PNT), this is about $\sum_{r<n/\log n}1/r\log r$, which is about $\log\log n$. Therefore, the probability that $n$ is prime is about $\exp(-\log\log n)$, which is $1/\log n$.

Here are a few criticisms of the above argument. The most glaring problem is that for the approximations to be valid, we need our estimates to be correct up to o(1), and they are not. For instance, it is known that $\sum_{p<n}1/p$ is not $\log\log n+o(1)$, but rather $\log\log n+M$, where M is the Meissel-Mertens constant. We can break this failure down into two subfailures. The first is that minus the logarithm of $\prod_{p<n}(1-1/p)$ differs from $\sum_{p<n}1/p$ by a non-zero constant (plus o(1)). The second is that minus the logarithm of $\prod_{p<n}(1-1/p)$ differs from $\log\log n$ by $\gamma+o(1)$, where $\gamma$ is the Euler-Mascheroni constant.

The second problem is more devastating, since it shows that the independence assumption is seriously flawed. (Everything I have said, by the way, is a well known and often pointed out observation.) My question is whether, despite all these problems, some kind of heuristic argument like this can be salvaged. For instance, it's clear that if p and q are two primes that are fairly large and fairly close, then there will be a repulsion between the events p|n and q|n. Can we say heuristically what the effects of these repulsions should be, and thereby understand where the $\gamma$ comes in?

Just to be clear, I am looking for a simple and non-rigorous argument that does not use the zeta function (except perhaps making use of the product formula in a very elementary way, but I'd rather avoid it altogether) that predicts that if PNT holds up to n then the probability that n is prime should be around $1/\log n$. I'm asking the question because I'm pretty sure the answer will be known, and pretty standard, to many people. It just isn't to me.

Best Answer

I'm not an expert in this area, but this may be a start.

Rather than $\prod_{p\lt n}$, you can use $\prod_{p\le \sqrt n}$.

$\log\log \sqrt n + \gamma \lt \log\log \sqrt n +\log 2 = \log\log n $

That gets you a little closer, since now you are off by $\log 2 - \gamma \approx 0.116$.

The heuristic probability that $n$ is prime is not

$$\prod_{p\lt n} (1-Pr(p|n))$$

It is the product of probabilities

$$\prod_{p\lt n} (1-Pr(p|n \text{ given no smaller prime divides } n))$$

For $p$ small, the term you get may be close to $(1-1/p)$, but I that's not the case for $p$ large.

For $\sqrt n \lt p$, the term corresponding to $p$ in the product is just $1$.

For $\sqrt[3]n \lt p \le \sqrt n$, if $p$ is the smallest prime dividing $n$, then $n/p$ must be prime, too. Perhaps that means that by strong induction, we should discount these terms by the probability $n/p$ is prime, about $1/\log \frac np$, so that those terms in the product are $(1-1/(p \log \frac np))$.

It looks like you get some sums/integrals if you try to extend this to more terms, and I don't know whether you can expect to get the desired accuracy at the end.