[Math] Least prime in an arithmetic progression and the Selberg sieve

multiplicative-number-theorynt.number-theorysieve-theory

My question concerns a technical step in the proof of Linnik's theorem on the least prime in an arithmetic progression, as presented in Chapter 18 of Iwaniec-Kowalski: Analytic number theory.

The proof uses certain weights $\theta_b$ coming from the theory of the Selberg sieve. The sequence is supported on square-free numbers up to $y$ coprime with $q$ (the modulus of the arithmetic progression), and its main feature is, cf. (18.19)-(18.23) in the book,
$$ |\theta_b|\leq 1,\qquad \theta_1=1, $$
$$ 0<\sum_{b_1,b_2}\frac{\theta_{b_1}\theta_{b_2}}{[b_1,b_2]}\leq\frac{q}{\varphi(q)\log y}. $$
The corresponding Selberg upper sieve coefficients are defined by
$$ \sigma_m:=\sum_{[b_1,b_2]=m}\theta_{b_1}\theta_{b_2}, $$
so that with the notation
$$ \nu(n):=\sum_{b\mid n}\theta_b $$
we have for any integer $n\geq 1$
$$ \sum_{m\mid n}\sigma_m=\nu(n)^2\geq\sum_{m\mid n}\mu(m). $$

By (18.70) of the book we have, "applying a sieve of dimension 8 (see the Fundamental Lemma 6.3)",
$$ \sum_{n\leq x}\nu^2(n)\frac{\tau^3(n)}{n}\ll\left(\frac{\log x}{\log y}\right)^8. $$
Here $\tau(n)$ is the number of divisors of $n$.
Can anyone help me understand why this is true? The quoted lemma is about Brun's sieve (where $\sigma_m$ would be $\mu(m)$ restricted to certain integers), but even if I accept it for the Selberg sieve, I do not see the stated bound. Similarly, I do not understand why (18.75) in the book is true.

Added. Based on the expert response of Dimitris Koukoulopoulos I think that the proof of Linnik's theorem, as presented in Iwaniec-Kowalski's book, works fine if we replace the Selberg sieve $\sigma_m$ with an upper $\beta$-sieve $\beta_m$. More precisely, we redefine $\nu(n)$ so that
$$\nu(n)^2=\sum_{m\mid n}\beta_m,$$
which makes sense as the right hand side is nonnegative.

Best Answer

The proof given in Iwaniec-Kowalski is, as it stands, wrong. It can be easily fixed, as I explain below.

In general, one can think of $\nu(n)^2$ as the characteristic function of integers with $P^-(n):=\min(p|n)\ge y$. So $$ \sum_{n\le x} \frac{\nu(n)^2 f(n)}{n} \approx \sum_{ n\le x,\ P^-(n)\ge y } \frac{f(n)}{n} \asymp \prod_{y\le p\le x} \left(1+\frac{f(p)}{p}\right) $$ for any reasonable multiplicative function $f$ that is bounded on primes. However, an important restriction is that $f(p)<\kappa$ on average, where $\kappa$ is the sifting dimension. In this case the dimension is 1, whereas $f(p)=8$. So the sum in question, say $S$, does not satisfy a priori the claimed bound. In fact, in this case $S\gg(\log x/\log y)^8$:

If $P^+(n)=\max(p|n)$, then we have that $$ S= \sum_{n\le x}\frac{\nu(n)^2\tau(n)^3}{n} \asymp \sum_{P^+(n)\le x}\frac{\nu(n)^2\tau(n)^3}{n} $$ (this step is heuristic and employed for simplicity). If we let $\sigma_m=\sum_{[d_1,d_2]=m}\theta_{d_1}\theta_{d_2}$ and we open $\nu(m)^2$, we have that $$ S \approx \sum_{P^+(m)\le x} \sigma_m \sum_{P^+(n)\le x,\ m|n} \frac{\tau(n)^3}{n} = P(x) \sum_{P^+(m)\le x} \frac{\sigma_m g(m)}{m}, $$ where $g(m)$ is a multiplicative function with $g(p)=8+O(1/p)$ and $P(x)=\prod_{p\le x}(1+\tau(p)^3/p+\tau(p^2)^3/p^2+\cdots)\asymp(\log x)^8$. Hence $$ S/P(x)\approx \sum_{P^+(d_1d_2)\le x} \frac{\theta_{d_1}\theta_{d_2}g([d_1,d_2])}{[d_1,d_2]} = \sum_{P^+(d_1d_2)\le x} \frac{\theta_{d_1}\theta_{d_2}g(d_1)g(d_2)}{d_1d_2} \frac{(d_1,d_2)}{g((d_1,d_2))}. $$ Writing $f(m)=\prod_{p|m}(1-g(p)/p)$ so that $m/g(m)=\sum_{n|m}f(n)n/g(n)$, we deduce that $$ S/P(x)\approx \sum_{P^+(n)\le x}\frac{f(n)n}{g(n)} \left( \sum_{P^+(d)\le x,\ n|d} \frac{\theta_{d}g(d)}{d}\right)^2. $$ When $y/2 \lt n\le y$, we have that $$ \sum_{P^+(d)\le x,\ n|d} \frac{\theta_{d}g(d)}{d} = \frac{\theta_n g(n)}{n} = \frac{\mu(n)g(n)}{G} \sum_{k\le y,\ (k,q)=1,\ n|k} \frac{\mu^2(k)}{\phi(k)} = \frac{\mu(n)g(n)}{G} \cdot \frac{\textbf{1}_{(n,q)=1}}{\phi(n)} $$ (note that there is an error in the definition of $\theta_b$ in Iwaniec-Kowalski, as one has to restrict them on those $b$ which are coprime to $q$. In fact, $\theta_b=(\mu(b)b/G) \sum_{k\le y,\ (k,q)=1,\ b|k}\mu^2(k)/\phi(k)$). So $$ S \gtrsim \frac{P(x)}{G^2} \sum_{y/2 \lt n\le y,\ (n,q)=1} \frac{\mu^2(n) f(n)g(n)n}{\phi(n)^2} \gg_q (\log x)^8(\log y)^5, $$ by the Selberg-Delange method. This is certainly bigger than what we would need.

In order to remedy this deficiency, one would have to choose $\nu(n)$ in another way, as weights from a sieve of dimension 8 or higher. The easiest choice to work with is, as GH also points out, is to define $\nu$ via the relation $\nu(n)^2=(1*\lambda)(n)$, where $(\lambda(d):d\le D)$ is a $\beta$ upper bound sieve of level $y$ and dimension 8, so that $$ S = \sum_{n\le x} \frac{(1*\lambda)(n)\tau(n)^3}{n}. $$ The point is that the sequence $(\lambda(d))_{d\le D}$ satisfies the Fundamental Lemma (Lemma 6.3 in Iwaniec-Kowalski) in the following strong sense:

(1) $\lambda(d)$ is supported on square-free integers with $P^+(d)\le y$

(2) whenever $\{a(d)\}_{d\ge1}$ is a sequence such that $$ \bigg|\sum_{P^+(d)\le z}\frac{\mu(d)a(dm)}{d}\bigg| \le Ag(m)\prod_{y_0\le p\le z}(1-g(p)/p) \quad\text{when}\ P^-(m)>z, $$ where $A\ge0$ and $y_0\ge1$ are some parameters and $g\ge0$ is multiplicative with $$ \prod_{w\le p\le w'} (1-g(p)/p)^{-1} \le \left(\frac{\log w'}{\log w}\right)^8\left(1+\frac{K}{\log w}\right)\qquad(w'\ge w\ge y_0), $$ we have $$ \sum_{d\le D} \frac{\lambda(d)a(d)}{d} = \sum_{P^+(d)\le y} \frac{\mu(d)a(d)}{d} + O_K\left( Ae^{-u}\prod_{y_0\le p\le y}\left(1-\frac{g(p)}{p}\right) \right), $$ where $u=\log D/\log y$.

Remark: if $a=g$, then the condition on $a$ holds trivially with $A=y_0=1$. Hence, the above statement is indeed a generalization of the usual Fundamental Lemma.

We have that $$ S = \sum_{d\le D} \frac{\lambda(d)a(d)}{d}, $$ where $$ a(d) = \sum_{m\le x/d} \frac{\tau(dm)^3}{m}. $$ If $P^-(m)>z$, then \begin{align*} \sum_{P^+(d)\le z} \frac{\mu(d)a(dm)}{d} &= \sum_{P^+(d)\le z} \frac{\mu(d)}{d}\sum_{k\le x/(dm)}\frac{\tau(dkm)^3}{k} \\ &= \sum_{n\le x/m} \frac{\tau(nm)^3}{n} \sum_{dk=n,\,P^+(d)\le z}\mu(d). \end{align*} By M\"obius inversion, we then conclude that \begin{align*} \sum_{P^+(d)\le z} \frac{\mu(d)\tau(d)^3a(dm)}{d} &= \sum_{n\le x/m,\,P^-(n)>z} \frac{\tau(nm)^3}{n} \\ &\ll \tau(m)^3 (\log x/\log z)^8 \\ &\asymp (\log x)^8\tau(m)^3\prod_{11\le p\le z}(1-\tau(p)^3/p), \end{align*} so that the second axiom holds for $a$ with $A\asymp(\log x)^8$, $y_0=11$ and $g=\tau^3$. We thus conclude that \begin{align*} S = \sum_{d\le D} \frac{\lambda(d)a(d)}{d} &= \sum_{P^+(d)\le y} \frac{\mu(d)a(d)}{d} + O\left( e^{-\log x/\log y} \left(\frac{\log x}{\log y}\right)^8\right) \newline &= \sum_{n\le x,\ P^-(n)>y} \frac{\tau(n)^3 }{n} + O\left( e^{-\log x/\log y} \left(\frac{\log x}{\log y}\right)^8\right) \newline &\ll \left(\frac{\log x}{\log y}\right)^8. \end{align*}

A more general remark: Selberg's sieve is not as flexible as the $\beta$-sieve as far as ``preliminary sieving'' is concerned because it carries inside it the sieve problem it is applied to, in contrast to the $\beta$-sieve weights that only depend on the sifting dimension via the $\beta$ parameter.

EDIT: the monotonicity principle II described in page 49 of "Opera de Cribro" by Friedlander-Iwaniec is a way to use Selberg's sieve weights for preliminary sieving. See also Proposition 7.2 in the same book.

Related Question