In many textbooks, the Wilsonian and old-fashioned views of renormalization are treated as totally separate, but they are actually very closely connected. I will use your notation and describe three views of renormalization as would be taught in the three semesters of a typical QFT course.
QFT I: bare perturbation theory
Bare perturbation theory is the way renormalization is first explained in many textbooks, and the way it was first worked out. For concreteness, suppose $\phi^3$ theory is relevant to our world, and we observe that the particles have physical mass $m_p$ and physical interaction strength $g_p$, by measuring cross sections. That is, naively we have a theory with Lagrangian
$$\mathcal{L}_{\text{naive}} = (\partial \phi)^2 + m_p^2 \phi^2 + g_p \phi^3$$
where I suppress all numerical coefficients. Then the tree-level predictions of this theory roughly match that of observations. But this agreement is an illusion. When we go to one-loop order, we see the physical mass and interaction strength are instead predicted to be infinite.
In order to fix this problem, we must impose a cutoff $\Lambda$ and instead use the "bare" Lagrangian
$$\mathcal{L}_{\text{bare}}(\Lambda) = (\partial \phi)^2 + m(\Lambda)^2 \phi^2 + g(\Lambda) \phi^3$$
where $m(\Lambda)$ and $g(\Lambda)$ are formally divergent quantities, fixed so that physical predictions are finite. Concretely, for example, we may calculate the correlator $\langle \phi \phi \phi \rangle \sim g$ as a power series in $g(\Lambda)$, giving a series expansion for $g$ in terms of $g(\Lambda)$. We then flip this around to fix $g(\Lambda)$ in terms of $g$. In this way, we get finite predictions that match experiment.
QFT II: on-shell renormalized perturbation theory
Bare perturbation theory is a bit unsatisfactory. For example, it works perturbatively in $g(\Lambda)$, a formally divergent quantity. And the logic is not in the right order: we shouldn't change the theory we're working with once we see our naive theory doesn't work, we should just work with the correct theory from the start.
Instead, in renormalized perturbation theory, we start with the correct Lagrangian and split it as
$$\mathcal{L}_{\text{bare}}(\Lambda) = \mathcal{L}_{\text{naive}} + \mathcal{L}_{\text{CT}}(g_p, \Lambda).$$
Here $\mathcal{L}_{\text{naive}}$ is called the renormalized Lagrangian because it contains renormalized fields and couplings, such as $g_p$. We then perform perturbation theory in $g_p$, treating the counterterms as $O(g_p)$ and higher, which makes much more sense. The conditions used to determine the counterterms are the same as in bare perturbation theory.
To define a theory, it is sufficient to specify $\mathcal{L}_{\text{bare}}(\Lambda)$. On-shell renormalized perturbation theory, which splits this Lagrangian, is an extra layer of structure. The splitting is arbitrary, and in the case of the on-shell scheme is conceptually useful because it allows us to work with physical quantities like $m_p$ and $g_p$ throughout the calculation.
It is often said, incorrectly, that "the renormalized Lagrangian is found by adding counterterms to the bare Lagrangian". That is incorrect, because the bare Lagrangian is the whole Lagrangian; we don't add anything to it. Making this mistake causes the names to be swapped, which generates a lot of confusion.
QFT III: Wilsonian renormalization
In the Wilsonian picture, we get the best of both worlds: the naive directness of bare perturbation theory, and the proper setup of renormalized perturbation theory.
In this setup, we imagine we are performing experiments near, but below some energy $\mu$, and find particles with mass $m_p$ and interaction strength $g_p(\mu)$. We may describe these results with a Wilsonian effective action
$$\mathcal{L}_{\text{eff}}(\mu) = \mathcal{L}_{\text{naive}}|_{g = g_p(\mu)}$$
which is a reasonably good description, even when all quantum/loop effects are accounted for, because the loops get cut off at the low scale $\mu$. Thus in the Wilsonian picture observed quantities get translated directly into couplings.
Next, because we are high energy physicists, we want to use this information to find a more fundamental theory, valid up to some higher scale $\Lambda$. Let the Lagrangian for this fundamental theory be $\mathcal{L}_{\text{fund}}(\Lambda)$. Then we have
$$\mathcal{L}_{\text{fund}}(\Lambda) = \mathcal{L}_{\text{eff}}(\mu) + \Delta \mathcal{L}$$
where $\Delta \mathcal{L}$ is found by integrating out degrees of freedom between $\mu$ and $\Lambda$. Now, $\mathcal{L}_{\text{CT}}$ is found by computing the exact same integrals, but between $0$ and $\Lambda$. (The difference in the lower bound is just because the naive Lagrangian accounts for no quantum effects at all, while $\mathcal{L}_{\text{eff}}(\mu)$ accounts for quantum effects up to scale $\mu$, and don't matter all that much.)
Therefore, by comparison with what we found earlier,
- the renormalized Lagrangian is the Wilsonian effective Lagrangian
- the bare Lagrangian is the fundamental Lagrangian
- the counterterm is the term needed to compensate for the RG flow between them
Note that in all three versions presented above I've included a finite cutoff. If a continuum limit exists, we may take $\Lambda \to \infty$ for the bare/fundamental Lagrangian.
One final subtlety: when we can take this limit, the counterterms are finite! They are just the difference between the low-energy effective theory and some RG fixed point. We only think counterterms diverge because they diverge order by order in a series expansion. This doesn't mean the whole counterterm diverges; note that
$$\lim_{x \to \infty} \exp(-x) = 0$$
but the terms in the Taylor series diverge individually.
There are several interesting questions in the main question, plus a point in the comment that I want to address. Similar ideas are discussed here and in arxiv 0702.365.
Disclaimer : I will only speak about QFT that have a finite UV cut-off $\Lambda$. That remove all the complications of the definition of the (non-pertubative) continuum limit $\Lambda\to \infty$, as discussed in the answer linked above. It is only if you believe that a particular QFT is the absolute, true description of the universe that this limit is interesting. And it's pretty sure that it is not the case. Nevertheless, we can be interested in the limit where $\Lambda$ is very large compare to all other energy scales (which is equivalent to take $\Lambda\to\infty$).
First of all, bare parameters can be physical, and measured. They just don't correspond to the same quantities than renormalized parameters. For instance, take the (classical) Ising model. It has one coupling constant $K=J/T$. Using standard calculations, one can rewrite the partition function as a field theory with action $$ S[\phi_i]=\sum_{ij} t_{ij} \phi_i \phi_j- \sum_i \ln \cosh \phi_i, $$ where $\phi_i$ is the value of the field on a lattice site $i$, and $t_{ij}$ is related to the interaction energy $K$ (see for example this article for the details). Thus, if you know $K$, which is accessible (for instance in the case of simulations), you know the bare parameters ! You can also measure this kind of parameters experimentally (exchange energy).
Notice that this field theory is non-perturbative (if one expands the potential, all the coupling constants (and there are an infinity of them) are of the same order ! The only reason one can use the pertubative $\phi^4$ theory to describe the Ising model is because one is usually interested only in the universal quantities, that don't care about the details of the microscopic theory, as long as the universality class is the same.
For the sake of completeness : if one expand the potential (the $\ln\cosh$) to the order four, the quadratic term will be called the mass term and the fourth term the interaction. There is also a contribution to the mass coming from $t_{ii}$. One can then show that for $K$ large enough, the potential has two non trivial minimum, corresponding to the ferromagnetic phase. The critical value of $K$ for the transition, noted $K^0_c$, is at the mean-field level both wrong and non-physical.
So far so good. Now let's add the fluctuations of the field, that will "renormalize" the theory. At one loop, one sees that the quadratic term (the "mass") has a correction which is proportional to some power of the cut-off, that is, it depends on the way the regulation is made, whether the lattice is cubic or triangular, etc. Is it a problem ? Not at all. This is just telling you that the (real, physical) critical coupling $K_c$ is non-universal, it depends strongly on the microscopic details of the system. In some sense, the calculation of these "divergent" (in reality, cut-off dependent) integrals corresponds to the calculation of the critical temperature, knowing the microscopic physics.
From the Wilsonian RG, we know that some quantities will depend on the cut-off, such as the critical temperature. They are usually very hard to compute using a field theory, since the initial action is non-pertubative and one can not use the pertubative RG (wilsonian or not). Only non-pertubative schemes (numerical approaches, or the non-pertubative RG discussed in the arxiv articles linked above) can access these quantities. But there are universal quantities, such as the critical exponents, that can be computed with pertubative approach, as long as one stays in the same universality class. To compute these quantities, one has to be close to the fixed point of the RG, that is, at energy very low compare to the microscopic scales, equivalent to take $\Lambda\to \infty$.
This leads us to see the difference between Wilson RG and "old school" RG. In the former, one is imposing the microscopic value of $K$, and then look at what is the physical mass.
In the latter, one imposes the physical value of the mass, and does not care about the microscopic details, so one wants to send $\Lambda\to \infty$. One thus has to absorb the "correction" to the mass in order to fix it.
So, to (finally, but partially) answer your question "The running of the coupling in Wilson's approach has nothing to do with the bare parameters going to infinity when the cutoff is removed right?" :
In the Wilsonian approach, one starts from the microscopic scale $\Lambda$ and looks at what's going at smaller energy, whereas in the "standard" approach, one fixes than macroscopic scale and sends $\Lambda\to \infty$ in order to effectively probe smaller and smaller energy scales.
Best Answer
I think the confusion is due to a lack of mathematically precise definitions of what is quantum field theory? what is one trying to construct and how? etc. There are a lot of vague notions used in the physics literature: the partition function (which does not make much sense in infinite volume), the effective action,...but the bottom line is the collection of all correlation functions of the theory, these (in the Euclidean setting) should be honest Schwartz distributions with singular support contained in the big diagonal (for an $n$-point function in $d$ dimensions this would be the subspace of $\mathbb{R}^{nd}$ where some of the $n$ points coincide). The goal of the RG, Wilsonian or "standard", is to have such correlations converge in the sense of distributions when one removes the cut-off. To understand how this works, in a precise manner, you can read the short article "QFT, RG, and all that, for mathematicians, in eleven pages" that I wrote recently.
A much more detailed account of what tried to explain in the comments below is here: Wilsonian definition of renormalizability