The proof given in Iwaniec-Kowalski is, as it stands, wrong. It can be easily fixed, as I explain below.
In general, one can think of $\nu(n)^2$ as the characteristic function of integers with $P^-(n):=\min(p|n)\ge y$. So
$$
\sum_{n\le x} \frac{\nu(n)^2 f(n)}{n} \approx \sum_{ n\le x,\ P^-(n)\ge y } \frac{f(n)}{n} \asymp \prod_{y\le p\le x} \left(1+\frac{f(p)}{p}\right)
$$
for any reasonable multiplicative function $f$ that is bounded on primes. However, an important restriction is that $f(p)<\kappa$ on average, where $\kappa$ is the sifting dimension. In this case the dimension is 1, whereas $f(p)=8$. So the sum in question, say $S$, does not satisfy a priori the claimed bound. In fact, in this case $S\gg(\log x/\log y)^8$:
If $P^+(n)=\max(p|n)$, then we have that
$$
S= \sum_{n\le x}\frac{\nu(n)^2\tau(n)^3}{n}
\asymp \sum_{P^+(n)\le x}\frac{\nu(n)^2\tau(n)^3}{n}
$$
(this step is heuristic and employed for simplicity). If we let $\sigma_m=\sum_{[d_1,d_2]=m}\theta_{d_1}\theta_{d_2}$ and we open $\nu(m)^2$, we have that
$$
S \approx \sum_{P^+(m)\le x} \sigma_m \sum_{P^+(n)\le x,\ m|n} \frac{\tau(n)^3}{n}
= P(x) \sum_{P^+(m)\le x} \frac{\sigma_m g(m)}{m},
$$
where $g(m)$ is a multiplicative function with $g(p)=8+O(1/p)$ and $P(x)=\prod_{p\le x}(1+\tau(p)^3/p+\tau(p^2)^3/p^2+\cdots)\asymp(\log x)^8$. Hence
$$
S/P(x)\approx \sum_{P^+(d_1d_2)\le x} \frac{\theta_{d_1}\theta_{d_2}g([d_1,d_2])}{[d_1,d_2]}
= \sum_{P^+(d_1d_2)\le x} \frac{\theta_{d_1}\theta_{d_2}g(d_1)g(d_2)}{d_1d_2} \frac{(d_1,d_2)}{g((d_1,d_2))}.
$$
Writing $f(m)=\prod_{p|m}(1-g(p)/p)$ so that $m/g(m)=\sum_{n|m}f(n)n/g(n)$, we deduce that
$$
S/P(x)\approx \sum_{P^+(n)\le x}\frac{f(n)n}{g(n)} \left( \sum_{P^+(d)\le x,\ n|d} \frac{\theta_{d}g(d)}{d}\right)^2.
$$
When $y/2 \lt n\le y$, we have that
$$
\sum_{P^+(d)\le x,\ n|d} \frac{\theta_{d}g(d)}{d}
= \frac{\theta_n g(n)}{n} = \frac{\mu(n)g(n)}{G} \sum_{k\le y,\ (k,q)=1,\ n|k} \frac{\mu^2(k)}{\phi(k)}
= \frac{\mu(n)g(n)}{G} \cdot \frac{\textbf{1}_{(n,q)=1}}{\phi(n)}
$$
(note that there is an error in the definition of $\theta_b$ in Iwaniec-Kowalski, as one has to restrict them on those $b$ which are coprime to $q$. In fact, $\theta_b=(\mu(b)b/G) \sum_{k\le y,\ (k,q)=1,\ b|k}\mu^2(k)/\phi(k)$). So
$$
S \gtrsim \frac{P(x)}{G^2} \sum_{y/2 \lt n\le y,\ (n,q)=1} \frac{\mu^2(n) f(n)g(n)n}{\phi(n)^2}
\gg_q (\log x)^8(\log y)^5,
$$
by the Selberg-Delange method. This is certainly bigger than what we would need.
In order to remedy this deficiency, one would have to choose $\nu(n)$ in another way, as weights from a sieve of dimension 8 or higher. The easiest choice to work with is, as GH also points out, is to define $\nu$ via the relation $\nu(n)^2=(1*\lambda)(n)$, where $(\lambda(d):d\le D)$ is a $\beta$ upper bound sieve of level $y$ and dimension 8, so that
$$
S = \sum_{n\le x} \frac{(1*\lambda)(n)\tau(n)^3}{n}.
$$
The point is that the sequence $(\lambda(d))_{d\le D}$ satisfies the Fundamental Lemma (Lemma 6.3 in Iwaniec-Kowalski) in the following strong sense:
(1) $\lambda(d)$ is supported on square-free integers with $P^+(d)\le y$
(2) whenever $\{a(d)\}_{d\ge1}$ is a sequence such that
$$
\bigg|\sum_{P^+(d)\le z}\frac{\mu(d)a(dm)}{d}\bigg|
\le Ag(m)\prod_{y_0\le p\le z}(1-g(p)/p)
\quad\text{when}\ P^-(m)>z,
$$
where $A\ge0$ and $y_0\ge1$ are some parameters and $g\ge0$ is multiplicative with
$$
\prod_{w\le p\le w'} (1-g(p)/p)^{-1} \le \left(\frac{\log w'}{\log w}\right)^8\left(1+\frac{K}{\log w}\right)\qquad(w'\ge w\ge y_0),
$$
we have
$$
\sum_{d\le D} \frac{\lambda(d)a(d)}{d} = \sum_{P^+(d)\le y} \frac{\mu(d)a(d)}{d}
+ O_K\left( Ae^{-u}\prod_{y_0\le p\le y}\left(1-\frac{g(p)}{p}\right) \right),
$$
where $u=\log D/\log y$.
Remark: if $a=g$, then the condition on $a$ holds trivially with $A=y_0=1$. Hence, the above statement is indeed a generalization of the usual Fundamental Lemma.
We have that
$$
S = \sum_{d\le D} \frac{\lambda(d)a(d)}{d},
$$
where
$$
a(d) = \sum_{m\le x/d} \frac{\tau(dm)^3}{m}.
$$
If $P^-(m)>z$, then
\begin{align*}
\sum_{P^+(d)\le z} \frac{\mu(d)a(dm)}{d}
&= \sum_{P^+(d)\le z} \frac{\mu(d)}{d}\sum_{k\le x/(dm)}\frac{\tau(dkm)^3}{k} \\
&= \sum_{n\le x/m} \frac{\tau(nm)^3}{n} \sum_{dk=n,\,P^+(d)\le z}\mu(d).
\end{align*}
By M\"obius inversion, we then conclude that
\begin{align*}
\sum_{P^+(d)\le z} \frac{\mu(d)\tau(d)^3a(dm)}{d}
&= \sum_{n\le x/m,\,P^-(n)>z} \frac{\tau(nm)^3}{n} \\
&\ll \tau(m)^3 (\log x/\log z)^8 \\
&\asymp (\log x)^8\tau(m)^3\prod_{11\le p\le z}(1-\tau(p)^3/p),
\end{align*}
so that the second axiom holds for $a$ with $A\asymp(\log x)^8$, $y_0=11$ and $g=\tau^3$. We thus conclude that
\begin{align*}
S = \sum_{d\le D} \frac{\lambda(d)a(d)}{d}
&= \sum_{P^+(d)\le y} \frac{\mu(d)a(d)}{d}
+ O\left( e^{-\log x/\log y} \left(\frac{\log x}{\log y}\right)^8\right) \newline
&= \sum_{n\le x,\ P^-(n)>y} \frac{\tau(n)^3 }{n}
+ O\left( e^{-\log x/\log y} \left(\frac{\log x}{\log y}\right)^8\right) \newline
&\ll \left(\frac{\log x}{\log y}\right)^8.
\end{align*}
A more general remark: Selberg's sieve is not as flexible as the $\beta$-sieve as far as ``preliminary sieving'' is concerned because it carries inside it the sieve problem it is applied to, in contrast to the $\beta$-sieve weights that only depend on the sifting dimension via the $\beta$ parameter.
EDIT: the monotonicity principle II described in page 49 of "Opera de Cribro" by Friedlander-Iwaniec is a way to use Selberg's sieve weights for preliminary sieving. See also Proposition 7.2 in the same book.
Best Answer
If $Y\geq X\geq 2$, then as Daniel Johnston wrote, Montgomery and Vaughan proved that
$$\pi(Y)-\pi(Y-X)\leq \frac{2X}{\log X}.$$
Whether this constitutes a "best bound" requires a definition of what you consider to be "best". Expounding on Noam Elkies' comment, it follows from work of Heath-Brown (building on Hoheisel, Tchudakov, Heath-Brown and Iwaniec, Huxley, and many others) that if $Y^{7/12}\leq X\leq Y$, then
$$\pi(Y)-\pi(Y-X)\sim\frac{X}{\log Y}.$$
Therefore, for all $\epsilon>0$ there exists $Y_0(\epsilon)>0$ such that if $Y\geq Y_0(\epsilon)$ and $Y^{7/12}\leq X\leq Y$, then
$$\Big|\pi(Y)-\pi(Y-X)-\frac{X}{\log Y}\Big|\leq \epsilon \frac{X}{\log Y}.$$
(This can be made more precise using known error terms in the prime number theorem for short intervals.) The notion of "best" depends on context.