In many textbooks, the Wilsonian and old-fashioned views of renormalization are treated as totally separate, but they are actually very closely connected. I will use your notation and describe three views of renormalization as would be taught in the three semesters of a typical QFT course.
QFT I: bare perturbation theory
Bare perturbation theory is the way renormalization is first explained in many textbooks, and the way it was first worked out. For concreteness, suppose $\phi^3$ theory is relevant to our world, and we observe that the particles have physical mass $m_p$ and physical interaction strength $g_p$, by measuring cross sections. That is, naively we have a theory with Lagrangian
$$\mathcal{L}_{\text{naive}} = (\partial \phi)^2 + m_p^2 \phi^2 + g_p \phi^3$$
where I suppress all numerical coefficients. Then the tree-level predictions of this theory roughly match that of observations. But this agreement is an illusion. When we go to one-loop order, we see the physical mass and interaction strength are instead predicted to be infinite.
In order to fix this problem, we must impose a cutoff $\Lambda$ and instead use the "bare" Lagrangian
$$\mathcal{L}_{\text{bare}}(\Lambda) = (\partial \phi)^2 + m(\Lambda)^2 \phi^2 + g(\Lambda) \phi^3$$
where $m(\Lambda)$ and $g(\Lambda)$ are formally divergent quantities, fixed so that physical predictions are finite. Concretely, for example, we may calculate the correlator $\langle \phi \phi \phi \rangle \sim g$ as a power series in $g(\Lambda)$, giving a series expansion for $g$ in terms of $g(\Lambda)$. We then flip this around to fix $g(\Lambda)$ in terms of $g$. In this way, we get finite predictions that match experiment.
QFT II: on-shell renormalized perturbation theory
Bare perturbation theory is a bit unsatisfactory. For example, it works perturbatively in $g(\Lambda)$, a formally divergent quantity. And the logic is not in the right order: we shouldn't change the theory we're working with once we see our naive theory doesn't work, we should just work with the correct theory from the start.
Instead, in renormalized perturbation theory, we start with the correct Lagrangian and split it as
$$\mathcal{L}_{\text{bare}}(\Lambda) = \mathcal{L}_{\text{naive}} + \mathcal{L}_{\text{CT}}(g_p, \Lambda).$$
Here $\mathcal{L}_{\text{naive}}$ is called the renormalized Lagrangian because it contains renormalized fields and couplings, such as $g_p$. We then perform perturbation theory in $g_p$, treating the counterterms as $O(g_p)$ and higher, which makes much more sense. The conditions used to determine the counterterms are the same as in bare perturbation theory.
To define a theory, it is sufficient to specify $\mathcal{L}_{\text{bare}}(\Lambda)$. On-shell renormalized perturbation theory, which splits this Lagrangian, is an extra layer of structure. The splitting is arbitrary, and in the case of the on-shell scheme is conceptually useful because it allows us to work with physical quantities like $m_p$ and $g_p$ throughout the calculation.
It is often said, incorrectly, that "the renormalized Lagrangian is found by adding counterterms to the bare Lagrangian". That is incorrect, because the bare Lagrangian is the whole Lagrangian; we don't add anything to it. Making this mistake causes the names to be swapped, which generates a lot of confusion.
QFT III: Wilsonian renormalization
In the Wilsonian picture, we get the best of both worlds: the naive directness of bare perturbation theory, and the proper setup of renormalized perturbation theory.
In this setup, we imagine we are performing experiments near, but below some energy $\mu$, and find particles with mass $m_p$ and interaction strength $g_p(\mu)$. We may describe these results with a Wilsonian effective action
$$\mathcal{L}_{\text{eff}}(\mu) = \mathcal{L}_{\text{naive}}|_{g = g_p(\mu)}$$
which is a reasonably good description, even when all quantum/loop effects are accounted for, because the loops get cut off at the low scale $\mu$. Thus in the Wilsonian picture observed quantities get translated directly into couplings.
Next, because we are high energy physicists, we want to use this information to find a more fundamental theory, valid up to some higher scale $\Lambda$. Let the Lagrangian for this fundamental theory be $\mathcal{L}_{\text{fund}}(\Lambda)$. Then we have
$$\mathcal{L}_{\text{fund}}(\Lambda) = \mathcal{L}_{\text{eff}}(\mu) + \Delta \mathcal{L}$$
where $\Delta \mathcal{L}$ is found by integrating out degrees of freedom between $\mu$ and $\Lambda$. Now, $\mathcal{L}_{\text{CT}}$ is found by computing the exact same integrals, but between $0$ and $\Lambda$. (The difference in the lower bound is just because the naive Lagrangian accounts for no quantum effects at all, while $\mathcal{L}_{\text{eff}}(\mu)$ accounts for quantum effects up to scale $\mu$, and don't matter all that much.)
Therefore, by comparison with what we found earlier,
- the renormalized Lagrangian is the Wilsonian effective Lagrangian
- the bare Lagrangian is the fundamental Lagrangian
- the counterterm is the term needed to compensate for the RG flow between them
Note that in all three versions presented above I've included a finite cutoff. If a continuum limit exists, we may take $\Lambda \to \infty$ for the bare/fundamental Lagrangian.
One final subtlety: when we can take this limit, the counterterms are finite! They are just the difference between the low-energy effective theory and some RG fixed point. We only think counterterms diverge because they diverge order by order in a series expansion. This doesn't mean the whole counterterm diverges; note that
$$\lim_{x \to \infty} \exp(-x) = 0$$
but the terms in the Taylor series diverge individually.
Frankly, I am lost about your logic trail, not the book's. You seem to somehow connect the (super)symmetry variations (B) to the dynamical variations yielding the EOM, and predicate one on the other? Nothing of the sort is even implied in that book.
(A) is invariant under (B), as you confirmed. Its equations of motion need not take cognizance of (B), and hold whether you've "noticed" the symmetry (B) or not.
Your next step asks you to particularize (A) to (C), as I've corrected it. It, too, has the supersymmetry (B), but you are not asked to consider, or even recognize, that, up front. You are asked to find its equations of motion, and observe how "easy" elimination of F is in them, reducing three such to two, slightly messier ones. At no point have you connected to (B); it is a merely "optional" observation, basically at a "right angle" to the EOM. (Later on, you would use these EOM to confirm the on-shell conservation of the supercharge, but this is not apparent at the step you are working on.)
Best Answer
That's a good question.
The linear sigma model (11.14) has 3 terms, and hence 3 counterterms, and hence needs 3 renormalization conditions (11.17a+b+c).
Yes, since the $N$th component $$\phi^N(x)~=~ v +\sigma(x) \tag{11.8}$$ and the definition of the VEV$^1$ $$ v~:=~\frac{\mu}{\sqrt{\lambda}},\tag{11.7}$$ then condition (11.16) for a connected 1-point function is equal to $$\langle \sigma \rangle^c_{J=0}~=~0,$$ i.e. that the tadpoles for the $\sigma$ field vanish$^2$ $$ \left(\fbox{1PI}==\stackrel{\sigma}{=}==\right)_{\text{amputated}}~=~0,\tag{11.17a}$$ cf. e.g. my Phys.SE answer here.
Yes, diagrams with more and more interaction terms do enter on the LHS of eq. (11.17a). For explicit calculations at one-loop, see eqs. (11.31) & (11.32).
It is important to realize that the field $\phi~=~\phi_0/\sqrt{Z_{\phi}}$ and the VEV (11.7) scale differently under the RG flow. In particular, since both sides of eq. (11.7) should remain$^3$ finite/physical, one should not try to e.g. introduce infinite $Z$-factors and/or counterterms into eq. (11.7), cf. Refs. 1 & 2.
References:
M.E. Peskin & D.V. Schroeder, An Intro to QFT, 1995; p. 353-355.
M. Srednicki, QFT, 2007; chapter 31. A prepublication draft PDF file is available here.
--
$^1$ Here the mass parameter $\mu$ in the Lagrangian (11.14) is the renormalized mass. It should not to be conflated with the physical mass $m$, which satisfies
$$ \left(==\stackrel{\sigma}{=}==\fbox{1PI}==\stackrel{\sigma}{=}==\right)_{\text{amputated}}~=~m^2-\mu^2\quad \text{at}\quad p^2=m^2.$$
$^2$ The tadpoles for the $\pi^k$ fields vanish automatically, due to a $\mathbb{Z}_2$ symmetry.
$^3$ Later in dimensional regularization one typically makes the coupling constant $\lambda\to\lambda\tilde{\mu}^{\epsilon}$ dimensionless, cf. Ref. 2.