[Math] Density of numbers having large prime divisors (formalizing heuristic probability argument)

nt.number-theorypr.probabilityprime numbers

I want to prove that the set of natural numbers n having a prime divisor greater than $\sqrt{n}$ is positive.

I have a heuristic argument that this density should be $\log 2$, which is approximately 0.7, but I am not sure how this could be converted to a formal argument.

For any x, the probability that x is prime is approximately $1/ \log x$ (By the prime number theorem). Further, the probability that n is a multiple of x is approximately $1/x$. These are "independent" so the probability that n is a multiple of x and x is prime is approximately $1/x\log x$.

We know that n can have at most one prime divisor greater than $\sqrt{n}$, so the probability that n has a prime divisor greater than $\sqrt{n}$ can be approximated by the integral:

$$\int_{\sqrt{n}}^n \frac{dx}{x \log x} = [\log (\log x)]_{\sqrt{n}}^n = \log 2$$

Can this be made precise in terms of densities? How would the error terms be handled? Has this or a similar result already been proved?

ADDED LATER: The proofs below resolve this question, and they also seem to show that the density of numbers n with a prime divisor greater than $n^\alpha$ is $-\log \alpha$ for $1 > \alpha \ge 1/2$. My question: is the result also valid for $0 < \alpha < 1/2$? For such $\alpha$, we could have more than one prime divisor, so the simple counting above doesn't work; we need to use a sieve that subtracts the multiple contributions occurring from numbers that have more than one such prime divisor. My guess would be that asymptotically, this wouldn't matter, but I'm not sure how to formally show this.

Best Answer

This is actually fairly straightforward, and reduces to the fact that

$ \sum_{p \text{prime}} \frac 1p = \log \log x + C + o(1)$

for some constant $C$. To see how to apply this to the original problem, let $p$ denote the largest prime divisor of $n$, and write $n = pz$. Then $p \ge \sqrt{n}$ if and only if $z \le p$. Thus the number of such $n \le x$ is

$\sum_{p \le x} \sum_{z \le \min(p,x/p)} 1$ which breaks up into a main term

$\sum_{p \ge \sqrt{x}} \lfloor \frac xp \rfloor$ plus a smaller term $\sum_{p \le \sqrt{x}} p \le \sqrt{x} \pi(\sqrt x) = o(x)$, which is therefore negligible compared with the first term. The main term is taken care of by the equation I cited at the beginning, using the fact that $\pi(n) = o(n)$ and finally that

$\log \log x - \log \log \sqrt x = \log 2$.

Related Question