The main problem lies in the "large logarithms". Indeed, suppose you want to calculate some quantity in Quantum Field Theory, for instance a Green Function. In perturbation theory this is something like:
$$\tilde{G}(p_1,...,p_n)=\sum_k g^k F_k(p_1,...,p_n)$$
for some generic functions $F$ and $g$ is the coupling constant. It's not enough to require a small $g$. You need small $g$ AND small $F$, for every value of the momenta $p$ (so for every value of the energy scale of your system).
A nice little calculation to understand this point. It's obvious that:
$$\int_0^\infty \frac{dx}{x+a}=[log(x+a)]_0^\infty=\infty$$
Let's use a cutoff:
$$\int_0^\Lambda \frac{dx}{x+a}=log\frac{(\Lambda+a)}{a}$$
This is still infinite if the (unphysical) cutoff is removed. The whole point of renormalization is to show that a finite limit exist (this is "Fourier-dual" to send the discretization interval of the theory to zero). This quantity is finite:
$$\int_0^\Lambda \frac{dx}{x+a}-\int_0^\Lambda \frac{dx}{x+b} \rightarrow log\frac{b}{a}$$
But if $a \rightarrow \infty $ the infinite strikes back!
So for a generic quantity F(p) regularized to F(p)-F(0) we want at least two things: that the coupling is small at that momentum $p$ and that $p$ is not far away from zero. But zero is arbitrary, we can choose an arbitrary (subtraction) scale. So we can vary this arbitrary scale $\mu$ in such a way that it is always near the energy scale we are probing.
Is convenient to take this scale $\mu$ at the same value of the renormalization scale. This is the energy at which you take some finiteness conditions (usually two conditions on the two point Green function and one condition on the 4 point one). The finiteness conditions are real physical measures at an arbitrary energy scale, so they fix the universe in which you live. If you change $\mu$ and you don't change mass, charge, ecc. you are changing universe. The meaning of renormalization group equations is to span the different subtraction points of the theory, remaining in your universe. And of course every physical quantity is independent of these arbitrary scale.
EDIT:
Some extra motivations for the running couplings and renormalization group equations, directly for Schwartz:
The continuum RG is an extremely practical tool for getting partial results for high- order loops from low-order loops. [...]
Recall [...] that the difference between the momentum-space Coulomb potential V (t) at two scales, t1 and t2 , was proportional to [...]
ln t1 for t1 ≪ t2. The RG is able to reproduce this logarithm, and similar logarithms of physical quantities. Moreover, the solution to the RG equation is equivalent to summing series of logarithms to all orders in perturbation theory. With these all-orders results, qualitatively important aspects of field theory can be understood quantitatively. Two of the most important examples are the asymptotic behavior of gauge theories, and critical exponents near second-order phase transitions.
[...]
$$e^2_{eff}(p^2)=\frac{e^2_R}{1-\frac{e^2_R}{12 \pi^2}ln\frac{p^2}{\mu^2}}$$
$$e_R=e_{eff}(\mu)$$
 This is the effective coupling including the 1-loop 1PI graphs, This is called leading- logarithmic resummation.
Once all of these 1PI 1-loop contributions are included, the next terms we are missing should be subleading in some expansion. [...] However, it is not obvious at this point that there cannot be a contribution of the form $ln^2\frac{p^2}{\mu^2}$ from a 2-loop 1PI graph. To check, we would need to perform the full zero order calculation, including graphs with loops and counterterms. As you might imagine, trying to resum large logarithms beyond the leading- logarithmic level diagrammatically is extremely impractical. The RG provides a shortcut to systematic resummation beyond the leading-logarithmic level.
Another example: In supersymmetry you usually have nice (theoretically predicted) renormalization conditions at very high energy for your couplings (this is because you expect some ordering principle from the underlying fundamental theory, string theory for instance). To get predictions for the couplings you must RG evolve all the couplings down to electroweak scale or scales where human perform experiments. Using RG equations ensures that the loop expansions for calculations of observables will not suffer from very large logarithms.
A suggested reference: Schwartz, Quantum Field Theory and the Standard model. See for instance pag. 422 and pag.313.
T-matrix elements are not the same as fully connected diagrams. Remember T-matrix is defined as
$$T=S-I,$$
where $S$ is the S-matrix and $I$ is the identity matrix. S-matrix part includes all possible diagrams except vacuum bubbles and unamputated diagrams. Now we just need to ask, does subtracting an identity matrix remove all the disconnected diagram? Obviously no, because
$$\langle p_1,\ldots,p_m|I|k_1,\ldots,k_n\rangle=\delta_{mn}\sum_{\sigma}\prod_i\delta^4(p_{\sigma(i)}-k_i),$$
where $\sigma$ is a permutation of indices, and I have assumed bosons to avoid sign changes upon permutations. So identity matrix corresponds to diagrams with straight lines not only disconnected, but also containing no vertices, however the S-matrix contains some disconnected diagrams containing vertices, so only a subtraction of $I$ won't take them away.
$2\to2$ scattering is special, because if you actually try to draw diagrams with 4 external lines, it's either disconnected and containing no vertices(since we have excluded vacuum bubbles and unamputated diagrams), or just fully connected, so in this case the substraction of $I$ will take away all disconnected diagrams. This is why for $2\to2$ scattering, we just need to calculate fully connected diagrams. Peskin & Schroeder is potentially confusing because they never go into situations with more than 4 external lines.
Textbooks do not talk much about disconnected diagrams because they can be trivially calculated from its connected components, this silence might be another source of confusions(again, I'll advocate Weinberg here).
In conclusion, there is simply no such thing as "fully connected diagram prescription"(unless you want a fancy name for the $2\to2$ scattering case), and after these clarifications there should be no "contradiction" as described by OP.
Best Answer
There is a Ward identity that links the charge renormalization to the photon's wave function renormalization. Ward identities are relationships between correlation functions that follow from the quantum theory having a symmetry. In this case the gauge invariance of QED relates (among other things) the electron's two point function (propagator) to the $eA\bar{e}$ three point vertex.
If we write out the lagrangian including arbitrary scalings for $A$ and $\psi$ and I also put in a constant $Z_e$ to let the charge scale \begin{equation} \mathcal{L} =-\frac{1}{4} Z_1 F_{\mu\nu} F^{\mu\nu} + i Z_2\bar{\psi}\gamma^\mu\partial_\mu \psi + \sqrt{Z_1}Z_2 Z_e e \bar{\psi} \gamma^\mu A_\mu \psi \end{equation}
$Z_1,Z_2$ and $Z_e$ are all fixed by renormalization to cancel the divergent parts of the loop integrals.
The ward identity says that $Z_2=\sqrt{Z_1}Z_2 Z_e$, or in other words
\begin{equation} Z_e=\frac{1}{\sqrt{Z_1}} \end{equation}
Since $Z_1$ as I have defined it leads to a residue $1/\sqrt{Z_1}$ in the photon propagator, this is equivalent to the equation you wrote above. (by the way this factor of $1/\sqrt{Z_1}$ will appear on all photon propagators, not just the on shell ones).
Note that as a consequence of the ward identity I can rewrite the last two terms in the lagrangian as
\begin{equation} i Z_2\bar{\psi}\gamma^\mu\partial_\mu \psi + \sqrt{Z_1}Z_2 Z_e e \bar{\psi} \gamma^\mu A_\mu \psi= Z_2 i\bar{\psi}\gamma^\mu(\partial_\mu -i e A_\mu)\psi \end{equation}
The right hand side is the gauge covariant derivative.
So when you adjust $Z_1$ to fix the norm of the photon's propagator to 1 (to match the LSZ formula and so forth), you also must adjust the electric charge by an appropriate amount. Alternatively you could go and compute the $eA\bar{e}$ three point function (using a regulator that preserves the gauge invariance, such as dim reg) and you would find that you had to renormalize it by this amount (which amounts to a of the ward identity at 1-loop).
Bonus comment: The constant $Z_1$ which appears in the lagrangian is a 'wave function renormalization', its just a rescaling of the field $A$ by $A\rightarrow \sqrt{Z_1} A$. How do we know what the right value for $Z_1$ is? It's a convention, and the convention is fixed by the LSZ formula. The LSZ formula tells you how to compute observables, and its based on a convention where the photon propagator has residue 1. So if there were no quantum corrections we would set $Z_1=1$. Loops correct the action, so we have to pick a value of $Z_1$ to cancel off the loop contributions. The total $Z$, $Z=Z_1+Z_{loops}$, will end up equaling 1, but we pick $Z_1$ to cancel out the loop contributions. However, we are using $Z_1$ in our definition of the free photon theory around which we are perturbing, and so we have to use $Z_1$ consistently every time we use a photon propagator. (There are actually many conventions for exactly where you put things, this is just one way to think of it.) However, worrying about putting factors of $Z_1$ on photon propagators (or choosing a convention where you put those factors somewhere else) only really starts mattering if you do higher loops, because $Z_1-1$ is already $O(\hbar)$. At your level the main point to realize is what's going on conceptually: the $Z_1$ in the action sets the size of ALL photon propagators (because its really the overall normalization of the photon field). We use the LSZ formula to fix the normalization, but that fixes the normalization for all propagators, not just the external ones.