Euler Phi Function – Almost Exact Formula for Sum of Euler Phi Function

arithmetic-functionsnumber theorysummationtotient-function

Solving a problem, I needed a formula for $\Phi(n)=\sum_{k=1}^n \varphi(k)$, where $\varphi(k)$ is Euler's notorious totient function, that counts the number of numbers between $1$ and $k$ that are coprime with $k$. I looked it up online and found that the best we were able to get until now is
$$\Phi(n)=\sum_{k=1}^n \varphi(k)=\frac{3}{\pi^2}n^2+\mathcal{O}\left(n(\log{}n)^{\frac{2}{3}}(\log\log{}n)^{\frac{4}{3}}\right)$$
which becomes, when calculating:
$$\Phi(n)\simeq\frac{3}{\pi^2}n^2$$
and the error is quite small.

Howewer I needed an exact formula, and so, with some combinatorics and probability, I came up with the following:
$$\Phi(n)=\sum_{k=1}^n \varphi(k)\simeq\left({n\choose 2}+1\right)\prod_{p\ prime}\left(1-\frac{1}{n^2}{\left\lfloor{\frac{n}{p}}\right\rfloor}^2\right)$$
The proof is simple but quite long, you can find it here: https://www.overleaf.com/read/xxjmgmcwqfgb.

For simplicity I will call the product function $L(n) \triangleq\left({n\choose 2}+1\right)\prod_{p\hspace{1pt}prime}\left(1-\frac{1}{n^2}{\left\lfloor{\frac{n}{p}}\right\rfloor}^2\right)$ from now on.

It is interesting to notice that the product taken over all prime is the same as the product taken only over the primes s.t. $p<n$, and this makes the calculation a lot easier.

Unfortunately (and I have no idea why), $L(n)$ still doesn't compute the exact value of $\Phi(n)$, but the error it provides is a lot smaller than the one of $\frac{3}{\pi^2}n^2$, here are some examples:

$$ \Phi(2064)=1\,295\,506\hspace{2cm}\Phi(2064)-L(2064)=58.0496 \hspace{2cm}\Phi(2064)-\frac{3}{\pi^2}2064^2=592.093 $$
the error with $L$ is $ 0.0044$ %, the error with $\frac{3}{\pi^2}n^2$ is $0.0457$ %.

$$ \Phi(10463)=33\,282\,718\hspace{2cm}\Phi(10463)-L(10463)=-8.35606 \hspace{2cm}\Phi(10463)-\frac{3}{\pi^2}10463^2=6500.06 $$
the error with $L$ is $ -2.51063 \cdot 10^{-5}$ %, the error with $\frac{3}{\pi^2}n^2$ is $0.0195298%$ %.

There are even values where $\Phi(n)-L(n)\ll 1$, for example if $n=502$ we have:
$$ \Phi(502)=76\,698\hspace{2cm}\Phi(502)-L(502)=0.0215688 \hspace{2cm}\Phi(502)-\frac{3}{\pi^2}502^2=97.9693 $$
where the error for $L$ is $2.81217 \cdot 10^{-5} $ %

I have no idea how to interpret these results. If someone knows why the formula fails I'd be delighted to hear.

EDIT:

As @Peter Košinár pointed out, the correct formula should be
$$\sum_{k=1}^N \varphi(k)\simeq\ \frac{1}{2}\left(1+N^2\prod_{p\ prime}\left(1-\frac{1}{N^2}{\left\lfloor{\frac{N}{p}}\right\rfloor}^2\right)\right)$$

Best Answer

This is a very nice problem! I see two separate problems in the proof; one easy to fix, one not so much. The "big" one explains why there must be difference between $L(n)$ and the correct value of the sum. On the other hand, the "easy" one actually made the expression closer to the expected values; thus adding to the confusion.


We'll start with the "easy" one: The proof works by expressing a specific probability in two different ways. Since we are looking for exact match rather than just some kind of asymptotic behaviour, we need to be very careful to make sure it is indeed the same scenario being considered in both cases.

The second part of the proof considers all $N^2$ pairs of integers from the set $\{1,2,\ldots,N\}$ (the two integers in each pair are considered completely independently). On the other hand, the first part of the proof originally considered only pairs distinct integers from the same set, without paying attention to their order. This disregarded the pairs $(i,i)$ completely; even though the numerator of the fraction counted the pair $(1,1)$ as "winning". The attempted fix included the extra "winning" pair $(1,1)$ in the denominator, thus arriving at the expression

$$P(N)\overset{?}{=}\frac{\sum_{k=1}^N \varphi(k)}{\binom{N}{2}+1}$$

This is not sufficient, though, since it ignores the existence of the other ordered pairs of the form $(i,i)$. The correct expression for $P(N)$ looks as follows: $$P(N)=\frac{1}{N^2}\left(2\sum_{k=1}^N \varphi(k)-1\right)$$

The sum counts the number of winning pairs $(i,j)$ with $1\leq i\leq j\leq N$; this includes the special pair $(1,1)$. Since every pair can also be reversed and yield another valid pair, the sum is doubled; the extra $(-1)$ accounts for the $(1,1)$ being identical to its reversal.

Now, if we apply this correct value of $P(N)$ and equate it to the quantity found by the second approach: $$P^*(N)=\prod_{p\ prime}\left(1-\frac{1}{N^2}{\left\lfloor{\frac{N}{p}}\right\rfloor}^2\right)$$ we'd get $$L^*(N)=\frac{1}{2}\left(1+N^2\prod_{p\ prime}\left(1-\frac{1}{N^2}{\left\lfloor{\frac{N}{p}}\right\rfloor}^2\right)\right)$$

As indicated at the beginning, this function is actually worse estimate of $\sum_{k=1}^N\varphi(k)$ than the one provided by incorrect reasoning!


Now, the "big" issue is a lot more subtle and lies in the phrase "But being divisible by a prime number or being divisible by another prime number are independent events.". Is that really the case?

Consider a simpler problem: We are given two distinct primes, $p$ and $q$ and look at positive integers up to $N$. What is the probability of a (uniformly) randomly chosen number not being divisible by $p$? Clearly, it's $\left(1-\frac{1}{N}\left\lfloor\frac{N}{p}\right\rfloor\right)$. The same applies of $q$, of course. But what is the probability of a number being divisible by neither $p$ nor $q$? Is it the product of these two probabilities (as independence would require)? Sometimes yes, but most of the time no.

For a specific example, consider $N=20$ and $p=7$ and $q=11$. The probability for non-divisibility by $p$ is $\frac{18}{20}$, and by $q$ is $\frac{19}{20}$. However, the probability of "neither $p$, nor $q$" is $\frac{17}{20}$ rather than $\frac{18}{20}\times\frac{19}{20}$. It's not difficult to see why: if $N$ is smaller than $p\times q$, the two divisibility conditions are actually mutually exclusive (which is as far from independence as it can be). Note that even $N$ being greater than $p\times q$ is not sufficient to provide independence; that only happens when $N$ is a multiple of $p\times q$.

Yes, asymptotically, as $N$ grows, the probability will converge to the product of the two probabilities, but for a fixed $N$, there is no guarantee of equality. The whole thing gets even more complicated if we consider more than two primes; and even more so if we are looking at all the primes up to $N$, where even increasing $N$ and looking at asymptotic behaviour is complicated by further and further terms appearing in the product.

The same kind of a problem appears when considering pairs of integers and their common divisors; not-sharing-divisor-$p$ and not-sharing-divisor-$q$ are not really independent.


It is not difficult to see that the mistake of considering the probabilities as independent when they are not over-estimates the correct values; so the values given by $L(N)$ should be overestimates of the desired sums. The values of $L(N)$ are not always so; precisely because of the "easy" mistake. Explaining why those incorrect estimates are better would most likely require much deeper analysis of the interplay between primes.

Related Question