Question 1
Nontrivial RG flow is the result of explicit breaking of classical theory scale invariance in corresponding quantum field theory. If there is no dimensionful parameters in classical lagrangian of corresponding theory (the generalization on the presence of masses is straightforward, but is not relevant here), we naively expect that after scaling transformation,
$$
\Phi (x) \to e^{\sigma \epsilon}\Phi (e^{\epsilon}x), \quad x \to e^{\epsilon}x,
$$
with $\sigma$ being the canonical dimension of $\Phi$ field and $\epsilon$ being the continuous parameter of transformation, the action will be unchanged. Corresponding symmetry defines dilatation current conservation law:
$$
\partial_{\mu}D^{\mu} =0
$$
This naive law is completely broken by infinities of QFT (such breaking is called the trace anomaly), because of which regularization enters the game. I.e, we introduce the dimensionful parameter by hand, and initial free from dimensionful parameters lagrangian begins to contain the one, called $\mu$. Since it is dimensionful, it is called the scale of theory. However, it is unphysical, and we can't say that it is the scale on which theory is defined: for example, we can choose it so that it coincides with square of transfer momentum, but it is only formal correspondence which depends on our wish.
In general, because of the presence of scale the dilatation current conservation law is modified by quantum corrections. For example, for massless QED
$$
\partial_{\mu}D^{\mu} \sim \beta (\alpha)F_{\mu \nu}F^{\mu \nu}, \quad \alpha \equiv \frac{e^{2}}{4 \pi}
$$
This leads to nontrivial behaviour of lagrangian parameters (like couplings) with changing of $\mu$.
What's about your question, $\mu$ as the scale on which theory is defined? The answer is dimensional transmutation phenomena, which occurs because of described above breaking of scaling invarince. Precisely, by solving RG equation (here $\alpha$ is the running coupling)
$$
\mu\frac{d\alpha}{d\mu} = \beta (\alpha (\mu))
$$
we obtain that
$$
\alpha (\mu) = f(\mu , \mu_{0}, \alpha (\mu_{0}))
$$
We can invert this relation and use dimensionful parameter $\mu_{0}$ instead of $\alpha $ in perturbation theory (and this is what people call the dimensional transmutation). Such parameter is really physical scale: it defines set of theory parameters. For example, for QCD it defines the strong coupling scale, which is closely related to the confinement and chiral symmetry breaking scale $\Lambda_{QCD}$. The latter determines the scale at which effective theory which describes hadrons interactions works.
Question 2
1. General remarks
The scheme of renormalization precisely defines renormalization constants, including their finite part. In general, the renormalization constants are given as (for example, for dimensional regularization)
$$
\tag 0 Z_{i} = a_{i} + \sum_{j = 1}\frac{c^{(i)}_{j}}{\epsilon^{j}},
$$
We have the freedom for choosing $a_{i}$, while $c^{(i)}_{j}$ are completely fixed by the structure of infinities in theory. The renormalization group states that the scheme dependence of the physical observables is absent.
Your question is following: suppose that we have specifit renormalization scheme for which the scale parameter $\mu$ doesn't affect parameters of theory - particularly, the mass parameter, which is fixed by the pole of propagator - why do we introduce the other scheme, for which the mass becomes to run and the RG equations enter the game?
The specific renormalization scheme is called on-shell scheme, while the convenient scheme with the precence of the scale in the expression for the mass is called minimal subtraction. So what's the point?
2. On-shell renormalization scheme: limitations
Let's assume that you use on-shell renormalization scheme. For this scheme $a_{i}$ is not zero, and it is uniquely fixed by specific conditions.
Lets assume the simplest case - scalar theory with self-interaction, and lets concentrate on the mass renormalization. After computing the self-energy by keeping this scheme you have that the propagator is
$$
D^{-1}(p^{2}) = p^{2} - m_{\text{pole}}^{2} - \Sigma (p^{2}),
$$
where $m_{\text{pole}}$ is the physical mass, for which
$$
\tag 1 D^{-1}(m_{\text{pole}}^{2}) = 0
$$
(since it is observable than it doesn't depend on the $\mu$ scale), and $\Sigma (p^{2})$ is self-energy. Eq. $(1)$ expicitly reads
$$
\tag 2 \Sigma (p^{2} = m_{\text{pole}}^{2}) = 0
$$
Also, the requirement that the propagator has the unity residue leads to the statement that
$$
\tag 3 \left(\frac{d\Sigma (p^{2})}{dp^{2}}\right)_{p^{2} = m_{\text{pole}}^{2}} = 0
$$
This condition, let me remind, in fact is nothing but the requirement that the propagator corresponds to the one-particle state.
Note two things about $\Sigma (p^{2})$ in on-shell scheme. The first one is that it doesn't depend on the scale $\mu$ since the mass $m_{\text{pole}}$ is indeed scale independent, and this result, of course, is regularization scheme independent. The second is that the condition $(3)$ can't be satisfied in the limit of $m_{\text{pole}}^{2} = 0$, since in massless limit Callen-Lemman representation of the propagator (which just binds the pole of the Green function with one-particle state) doesn't contain isolated pole: the one-particle state with zero energy isn't different from many-particle states.
We can't deal with this problem without introducing regularizing scale $m_{\text{reg}}^{2}$. It is indeed unphysical, and in general this is the price for obtaining $\mu$-scale independent quantities in theories with massless states. Note that the most of realistic theories are ones with massless states. For example, QED in on-shell prescription suffers from IR divergences in self-energy because of exactly zero photon mass.
To avoid such singularities, we need to change the renormalization scheme.
3. Extra: the minimal subtraction scheme
For this scheme, all $a_{i}$s in Eq. $(0)$ are zero. So that, Eq. $(1)$ now is
$$
D^{-1}(p^{2}) = p^{2} - m^{2} - \Sigma (p^{2}, \alpha , \mu),
$$
where $\alpha$ is the set of other couplings which are present in theory (here they are couplings for cubic, quartic terms).
For $p^{2} = m_{\text{pole}}^{2}$
$$
D^{-1}(m_{\text{pole}}^{2}) = m_{\text{pole}}^{2} - m^{2} - \Sigma (m_{\text{pole}}^{2}(m^{2}, \alpha , \mu), \alpha, \mu) = 0,
$$
or at the lowest order of perturbation theory
$$
m_{\text{pole}}^{2} = m^{2} + \Sigma (m^{2}, \alpha , \mu)
$$
In this scheme $\Sigma$ depends on $\mu$ explicitly. But the $m_{\text{pole}}$ doesn't depend on it, so that we come to the statement that $m^{2}$ and $\alpha$ depend on $\mu$.
Best Answer
Your questions are closely related :)
The overall factors $(\not{p} - m)$ are indeed responsible for amputating the external legs. I don't have a copy of Schwartz on hand so I can't comment on what he might have meant, but the correlation functions do have propagators on the external legs before putting them on shell by applying LSZ.
The mass that goes into the LSZ formula is the physical, observable mass (so long as you normalize everything properly)--the mass of the asymptotic state. You can see this if you go through the LSZ derivation. The reason that the pole of the propagator corresponds to the physical mass is precisely because when the LSZ formula puts external particles on shell, it "picks out" the poles in the propagators (since you have $(\not{p} - m)\times 1 / (\not{p} - m)$), at the physical mass.