You hit upon the answer yourself: they are not talking about IR divergences. The renormalisation program is about absorbing unknown (not necessarily divergent) short distance physics into effective local operators. The beauty of this is we do not need a full understanding or rigorous formulation of the short distance physics if we are only interested on much larger distance scales. So you play this game of absorbing UV corrections into counter-terms. All the associated power counting is there to organise the UV corrections into a form that can be absorbed this way, order by order in perturbation theory.
IR divergences are a different beast. They come about because you have massless particles, so it is impossible in practice to distinguish a state with or without "soft" photons (or whatever massless things are in your theory). The way to handle this is to regulate in the IR, carefully compute observable quantities (inclusive cross sections) with wavepackets and so on. It is enough in principle to simply put the system in a box the size of the universe. Anything physical should not depend on the regulator in the end. One thing you definitely can't do is absorb IR divergences - which represent long range effects - into effective local counter-terms. You can integrate out a massless field, but you end up with a non-local effective theory that is a mess to deal with.
The key thing is that you need to be working with canonically normalized fields in order to use the power counting arguments.
Let's expand GR around flat space
\begin{equation}
g_{\mu\nu} = \eta_{\mu\nu} + \tilde{h}_{\mu\nu}
\end{equation}
The reason for the tilde will become clear in a second. So long as $\tilde{h}$ is "small" (or more precisely so long as the curvature $R\sim (\partial^2 \tilde{h})$ is "small"), we can view GR as an effective field theory of a massless spin two particle living on flat Minkowski space.
Then the Einstein Hilbert action takes the schematic form
\begin{equation}
S_{EH}=\frac{M_{pl}^2}{2}\int d^4x \sqrt{-g} R = \frac{M_{pl}^2}{2} \int d^4x \ (\partial \tilde{h})^2 + (\partial \tilde{h})^2\tilde{h}+\cdots
\end{equation}
where $M_{pl}\sim 1/\sqrt{G}$ in units with $\hbar=c=1$. $M_{pl}$ has units of mass. In this form you might thing that the interaction $(\partial \tilde{h})^2 \tilde{h}$ comes with a scale $M^2_{pl}$ with a positive power. However this is too fast--all the QFT arguments you have seen have assumed that the kinetic term had a coefficient of -1/2, not $M_{pl}^2$. Relatedly, given that $M_{pl}$ has units of mass and the action has units of $(mass)^4$, the field $\tilde{h}$ is dimensionless, so it is clearly not normalized the same way as the standard field used in QFT textbooks.
Now classically, the action is only defined up to an overall constant, so we are free to think of $M_{pl}^2$ as being an arbitrary constant. However, in QFT, the action appears in the path integral $Z=\int D\tilde{h}e^{iS[\tilde{h}]/\hbar}$ (note the notational distinction between $\tilde{h}$ and $\hbar$). Thus the overall constant of the action is not a free parameter in QFT, it is fixed and has physical meaning. Alternatively, you have to remember that the Einstein Hilbert action will ultimately be coupled to matter; when we do that, the scale $M_{pl}$ sitting in front of $S_{EH}$ will not multiply the matter action, and so $M_{pl}$ sets the relative scale between the gravitational action and the matter action.
The punchline is that we can't simply ignore the overall scale $M_{pl}^2$, it has physical meaning (ie, we can't absorb $M_{pl}$ into an overall coefficient multiplying the action). On the other hand, we want to put the action into a "standard" form where the overall scale isn't there, so we can apply the normal intuition about power counting. The solution is to work with a "canononically normalized field" $h$, related to $\tilde{h}$ by
\begin{equation}
\tilde{h}_{\mu\nu} = \frac{h_{\mu\nu}}{M_{pl}}
\end{equation}
Then the Einstein Hilbert action takes the form
\begin{equation}
S_{EH} = \int d^4 x \ (\partial h)^2 + \frac{1}{M_{pl}} (\partial h)^2 h + \cdots
\end{equation}
In this form it is clear that the interactions of the form $(\partial h)^2 h$ have a "coupling constant" $1/M_{pl}$ with dimensions 1/mass, which is non-renormalizable by power counting in the usual way.
Best Answer
It should stressed be that Peskin & Schroeder are here using the old Dyson definitions of renormalizability. For a more general derivation of eq. (10.13), see e.g. this Phys.SE post, which also answers several of OP's questions.
OP's main question seems to be the following. Assume that the spacetime dimension $d>2$ and that the theory only has one (scalar) field $\phi$ with a single coupling constant $\lambda$.
An $N$-point (connected) amplitude/correlation function is a sum of infinitely many (connected) Feynman diagrams with a fixed number $N$ of external legs.
If $[\lambda]\geq 0$ [corresponding to the (super)renormalizable case], then only for finitely many natural numbers $N \leq \frac{d}{[\phi]}$, where $[\phi]=\frac{d-2}{2}$, the $N$-point amplitude can contain Feynman diagrams$^1$ with $D\geq 0$.
(The above argument can be generalized to a theory with finitely many types of fields and coupling constants.)
--
$^1$ NB: Be aware that due to possible (UV) divergent subdiagrams, the actual (UV) divergencies may be worse than indicated by $D$.