Does the dominant prime factor contribute about $62\%$ of the value of the logarithm of numbers

divisibilityelementary-number-theorylimitsnumber theoryprime numbers

Let $n = p_1^{a_1}p_2^{a_2}\cdots p_k^{a_k}$ be the prime factorization of $n$ for some primes such that $p_i < p_{i+1}$. We define the minor prime factor and the dominant prime factor of $n$ as the primes which has the smallest the largest contributions respectively to the value of $n$. Clearly, the dominant and the minor prime factors of $n$ is not necessarily its largest and smallest prime factors. E.g. if $n = 2^8 3^6 5^2 7^2$ then the largest and the smallest prime factors of $n$ are $7$ and $2$ respectively where as its dominant and the minor prime factors are $3$ and $5$ respectively since $3^6 = 729$ is greater than $2^8 = 256, 5^2=25$ and $7^2=49$.

Definition 1: The minor prime factor of $n$ is defined as the prime factor $p_m$ such that $p_m^{a_m} \le p_i^{a_i}$.

Definition 2: If a number as two or more distinct prime factors, then the dominant prime factor of $n$ is defined as the prime factor $p_d$ such that $p_d^{a_d} \ge p_i^{a_i}$ for any $i \ne d$.

Note: If a number has only one distinct prime factor then its minor and its dominant prime factors are the same i.e. $p_m = p_d$.

Since $\frac{a_1 \ln p_1}{\ln n} + \frac{a_2 \ln p_2}{\ln n} + \cdots + \frac{a_k \ln p_k}{\ln n} = 1$ we can say that each term in this sum is the contribution of the respective prime to the value of $\ln n$ and we can ask on and average what is the contribution of the minor/dominant prime to logarithm of a number.

Let $p_d$ be the dominant prime factor of $n$. Experimental data for all $n \le 4 \times 10^{8}$ and the average over several smaller intervals upto $10^{100}$ ; suggests that such that

$$
\lim_{x \to \infty}\frac{1}{x}\sum_{n = 1}^x \frac{a_d\ln p_d}{\ln n} \sim 0.62
$$

Conjecture: The dominant prime factor contributes approximately $62\%$ of the value of the logarithm of natural numbers.

Can this be proved or disproved?

Update 24 Feb 2023: Digging into literature, I read about the Golomb-Dickman Constant whose value $0.6243299$ is suspiciously close to observed mean value of the dominant prime factor. This constant is related to the largest prime factor (not the dominant prime factor) through the Dickman Function.

SageMath Code:

n      = 2
step   = target = 10^6
r_max1 = r_max2 = 0

while True:
    x = prime_factors(n)
    max1 = 1
    max2 = 1
    
    for factor in x:
        p = 1
        check = 1
        while True:
            check = check*factor
            if n%check == 0:
                p = p + 1
            else:
                p = p - 1
                check = check/factor
                if check > max1:
                    max1 = check
                    max2 = check
                break
    
    if len(x) == 1:
        max2 = 1

    # print(n,max)
    l     = ln(n).n()
    lmax1  = ln(max1).n()
    lmax2  = ln(max2).n()
    r_max1 = r_max1 + lmax1/l
    r_max2 = r_max2 + lmax2/l

    if n == target:
        print(n, 'lower_bound:', r_max2/n, 'upper_bound:', r_max1/n, 'mean:', (r_max2/n + r_max1/n)/2)
        target = target + step
    n = n + 1

Best Answer

I will prove that the limit exists and is precisely the Golomb-Dickman constant $\lambda$. First, let me sketch how one can show that the expected value of $\frac{\log(P_1(n))}{\log(n)}$, where $P_1(n)$ denotes the largest prime factor of $n$, is precisely $\lambda\approx0.62433$ via probability theory.

We say that an integer $n\geq1$ is $y$-smooth if every prime divisor of $n$ is $\leq y$. We also say that $n\geq1$ is $y$-powersmooth if every prime power divisor of $n$ is $\leq y$. Let's denote by $S(x,y)$ and $PS(x,y)$ the set of $y$-smooth and $y$-powersmooth numbers $\leq x$ respectively. It is a theorem of Dickman that $$F_S(\nu):=\lim_{x\to+\infty}\frac{1}{x}|S(x,x^{\nu})|=\rho(1/\nu)$$ where $\rho(t)$ is Dickman's function satisfying the differential equation $t\rho'(t)+\rho(t-1)=0$.


For large $x\gg0$, choose a random $tx\in[0,x]$ where $t\in[0,1]$ (that is, consider the random variable $xT\sim U(0,x)$ where $U(0,x)$ is the discrete uniform distribution that "looks" continuous for large $x$). What is the probability that $\frac{\log(P_1(tx))}{\log(tx)}\leq\nu$? Well $$tx\in S(x,x^\nu)\iff P_1(tx)\leq x^\nu\iff \frac{\log(P_1(tx))}{\log(tx)}\leq\frac{\nu\log(x)}{\log(tx)}=\frac{\nu}{1+\frac{\log(t)}{\log(x)}}\sim\nu$$ so, for large $x\gg0$, we have $P\left(\frac{\log(P_1(xT))}{\log(xT)}\leq\nu\right)\sim P(xT\in S(x,x^\nu))=\frac{1}{x}|S(x,x^\nu)|\sim F_S(\nu)$. This means that $F_S(\nu)$ is the cumulative distribution function of $\frac{\log(P_1(n))}{\log(n)}$ so we can compute the expected value as $$\lim_{x\to+\infty}\frac{1}{x}\sum_{n\leq x}\frac{\log(P_1(n))}{\log(n)}=\int_0^1\nu F_S'(\nu)d\nu=\int_0^1\frac{-\rho'(1/\nu)}{\nu}d\nu$$ $$=\int_0^1\rho\left(\frac{1}{\nu}-1\right)d\nu=\int_0^\infty\frac{\rho(t)}{(1+t)^2}dt=\lambda$$


Now we denote the greatest prime power divisor of $n\geq1$ as $Q_1(n)$. For example, $$Q_1(24)=Q_1(2^3\times3)=2^3=8\text{ and }Q_1(72)=Q_1(2^3\times3^2)=3^2=9$$ We want to compute the expected value of $\frac{\log(Q_1(n))}{\log(n)}$ is a similar manner as before. The key idea is to note that, for large $x\gg0$ and $tx\in[0,x]$ chosen uniformly, we have $$tx\in PS(x,x^\nu)\iff Q_1(tx)\leq x^\nu\iff\frac{\log(Q_1(tx))}{\log(tx)}\leq\frac{\nu\log(x)}{\log(tx)}\sim\nu$$ so, again, $P\left(\frac{\log(Q_1(xT))}{\log(xT)}\leq\nu\right)\sim\frac{1}{x}|PS(x,x^\nu)|$. Thus, if we had $\lim\limits_{x\to+\infty}\frac{|S(x,x^\nu)|-|PS(x,x^\nu)|}{x}=0$ (which I'll prove in just a moment), it would immediately follow that $$\lim_{x\to+\infty}\frac{1}{x}|PS(x,x^\nu)|=\lim_{x\to+\infty}\frac{1}{x}|S(x,x^\nu)|=F_s(\nu)=\rho(1/\nu)$$ $$\Rightarrow\lim_{x\to+\infty}\sum_{n\leq x}\frac{\log(Q_1(n))}{\log(n)}=\int_0^1\nu F_S'(\nu)d\nu=\int_0^\infty\frac{\rho(t)}{(1+t)^2}dt=\lambda$$ as desired. So it suffices to show that $\lim\limits_{x\to+\infty}\frac{|S(x,x^\nu)|-|PS(x,x^\nu)|}{x}=0$.


First observe that every $y$-smooth number is $y$-powersmooth. Thus we have that $$0\leq|S(x,y)\setminus PS(x,y)|=|S(x,y)|-|PS(x,y)|$$ For every prime $p\leq y$, denote $M_p(x,y)=\{p^{1+\lfloor\log_p y\rfloor}k\leq x\}$ the set of multiples of $p^{1+\lfloor\log_p y\rfloor}$ which are $\leq x$ where $p^{1+\lfloor\log_p y\rfloor}$ is the least power of $p$ that exceeds $y$.

Now observe that, if $n\in S(x,y)\setminus PS(x,y)$, then $n$ is a product of powers of primes $\leq y$ as $n=\prod_{p_i\leq y}p_i^{\epsilon_i}$ where $\epsilon_i$ can be zero and, at the same time, $\exists p_i\leq y$ such that $p_i^{\epsilon}>y\Rightarrow \epsilon_i\geq1+\lfloor\log_{p_i}y\rfloor$. But this means that $n\in S(x,y)\setminus PS(x,y)$ implies that $\exists p\leq y$ prime such that $n\in M_p(x,y)$ so $S(x,y)\setminus PS(x,y)\subseteq\bigcup_{p\leq y}M_p(x,y)$. Thus, we can bound $|S(x,y)|-|PS(x,y)|$ by $$|S(x,y)|-|PS(x,y)|=|S(x,y)\setminus PS(x,y)|\leq\left|\bigcup_{p\leq y}M_p(x,y)\right|\leq\sum_{p\leq y}|M_p(x,y)|$$ $$=\sum_{p\leq y}\left\lfloor\frac{x}{p^{1+\lfloor\log_p y\rfloor}}\right\rfloor\leq\sum_{p\leq y}\frac{x}{p^{1+\lfloor\log_p y\rfloor}}\leq\sum_{p\leq y}\frac{x}{p^{\log_p(y)}}=\frac{x}{y}\sum_{p\leq y}1=\frac{\pi(y)}{y}x$$ $$\Rightarrow\boxed{\therefore 0\leq\frac{|S(x,y)|-|PS(x,y)|}{x}\leq\frac{\pi(y)}{y}}$$ Finally, note that $\frac{\pi(y)}{y}\sim\frac{1}{\log(y)}\to0$ by the prime number theorem which, in particular, implies $\lim\limits_{x\to+\infty}\frac{|S(x,x^\nu)|-|PS(x,x^\nu)|}{x}=0$. In fact, we only need $\frac{\pi(y)}{y}\to 0$ (i.e. the zero natural density of prime numbers) which is not that hard to prove.$\ \square$

Related Question