Dimensional regularization (i.e., dim-reg) is a method to regulate divergent integrals. Instead of working in $4$ dimensions where loop integrals are divergent you can work in $4-\epsilon$ dimensions. This trick enables you to pick out the divergent part of the integral, as using a cutoff does. However, it treats all divergences equally so you can't differentiate between a quadratic and logarithmic divergence using dim-reg. All it really does is hide the fine-tuning, not fix the problem.
As an example lets do the mass renormalization of $\phi^4$ theory. The diagram gives,
\begin{equation}
\int \frac{ - i \lambda }{ 2} \frac{ i }{ \ell ^2 - m ^2 + i \epsilon } \frac{ d ^4 \ell }{ (2\pi)^4 } = \lim _{ \epsilon \rightarrow 0 }\frac{ - i \lambda }{ 2} \frac{ - i }{ 16 \pi ^2 } \left( \frac{ 2 }{ \epsilon } + \log 4 \pi - \log m ^2 - \gamma \right)
\end{equation}
where I have used the ``master formula'' in the back of Peskin and Schoeder, pg. A.44 (note that this $ \epsilon $ doesn't have anything to do with the $ \epsilon $ in the propagator). This gives a mass renormalization of
\begin{equation}
\delta m ^2 = \lim _{ \epsilon \rightarrow 0 } \frac{ \lambda }{ 32 \pi ^2 } \left( \frac{ 2 }{ \epsilon } + \log 4 \pi - \log m ^2 - \gamma \right)
\end{equation}
Keeping only the divergent part:
\begin{equation}
\delta m ^2 = \lim _{ \epsilon \rightarrow 0 } \frac{ \lambda }{ 16 \pi ^2 } \frac{ 1 }{ \epsilon }
\end{equation}
This is the same result as the one you arrived at above, but uses a different regulator. You regulated your integral using a cut-off, I did using dim-reg. The mass correction diverges as $ \sim \frac{1}{ \epsilon }$. This is where the sensitivity to the UV physics is stored.
A cutoff, which is a dimensionful number, tells you something very physical, the scale of new physics. The $\epsilon$ is unphysical, just a useful parameter.
With a cutoff, depending on how badly your divergence is, you will get different scaling with the cutoff; it will be either logarithmic, quadratic, or quartic (which has real physical significance, namely, how sensitive the result is tothe high energy physics). However, dim-reg regulated integrals always diverge the same way, like $ \frac{1}{ \epsilon } $. Dim-reg doesn't care how your integral diverges. It can be a logarithmically divergent integral but using dim-reg you will still get a $ \frac{1}{ \epsilon }$ dependence. The reason for this is that $ \epsilon $ is not a physical quantity here. Its just a useful trick to regulate the integrals.
Since dim-reg hides the type of divergences that you have, people like to say that dim-reg solves the fine-tuning problem, because by using it you don't get to see how badly your divergence is. This viewpoint is clearly flawed since the quadratic divergences are still there, they just appear to be on the same footing as logarithmic divergences when you use dim-reg.
In short the fine-tuning problem isn't really fixed using dim-reg but if you use it then you can pretend the problem isn't here. This is by no means a solution to the fine-tuning, unless someone develops an intuition for why dim-reg is the ``correct'' way to regulate your integrals, i.e., a physical meaning for $ \epsilon $ (which its safe to say there isn't one).
Question 1
Nontrivial RG flow is the result of explicit breaking of classical theory scale invariance in corresponding quantum field theory. If there is no dimensionful parameters in classical lagrangian of corresponding theory (the generalization on the presence of masses is straightforward, but is not relevant here), we naively expect that after scaling transformation,
$$
\Phi (x) \to e^{\sigma \epsilon}\Phi (e^{\epsilon}x), \quad x \to e^{\epsilon}x,
$$
with $\sigma$ being the canonical dimension of $\Phi$ field and $\epsilon$ being the continuous parameter of transformation, the action will be unchanged. Corresponding symmetry defines dilatation current conservation law:
$$
\partial_{\mu}D^{\mu} =0
$$
This naive law is completely broken by infinities of QFT (such breaking is called the trace anomaly), because of which regularization enters the game. I.e, we introduce the dimensionful parameter by hand, and initial free from dimensionful parameters lagrangian begins to contain the one, called $\mu$. Since it is dimensionful, it is called the scale of theory. However, it is unphysical, and we can't say that it is the scale on which theory is defined: for example, we can choose it so that it coincides with square of transfer momentum, but it is only formal correspondence which depends on our wish.
In general, because of the presence of scale the dilatation current conservation law is modified by quantum corrections. For example, for massless QED
$$
\partial_{\mu}D^{\mu} \sim \beta (\alpha)F_{\mu \nu}F^{\mu \nu}, \quad \alpha \equiv \frac{e^{2}}{4 \pi}
$$
This leads to nontrivial behaviour of lagrangian parameters (like couplings) with changing of $\mu$.
What's about your question, $\mu$ as the scale on which theory is defined? The answer is dimensional transmutation phenomena, which occurs because of described above breaking of scaling invarince. Precisely, by solving RG equation (here $\alpha$ is the running coupling)
$$
\mu\frac{d\alpha}{d\mu} = \beta (\alpha (\mu))
$$
we obtain that
$$
\alpha (\mu) = f(\mu , \mu_{0}, \alpha (\mu_{0}))
$$
We can invert this relation and use dimensionful parameter $\mu_{0}$ instead of $\alpha $ in perturbation theory (and this is what people call the dimensional transmutation). Such parameter is really physical scale: it defines set of theory parameters. For example, for QCD it defines the strong coupling scale, which is closely related to the confinement and chiral symmetry breaking scale $\Lambda_{QCD}$. The latter determines the scale at which effective theory which describes hadrons interactions works.
Question 2
1. General remarks
The scheme of renormalization precisely defines renormalization constants, including their finite part. In general, the renormalization constants are given as (for example, for dimensional regularization)
$$
\tag 0 Z_{i} = a_{i} + \sum_{j = 1}\frac{c^{(i)}_{j}}{\epsilon^{j}},
$$
We have the freedom for choosing $a_{i}$, while $c^{(i)}_{j}$ are completely fixed by the structure of infinities in theory. The renormalization group states that the scheme dependence of the physical observables is absent.
Your question is following: suppose that we have specifit renormalization scheme for which the scale parameter $\mu$ doesn't affect parameters of theory - particularly, the mass parameter, which is fixed by the pole of propagator - why do we introduce the other scheme, for which the mass becomes to run and the RG equations enter the game?
The specific renormalization scheme is called on-shell scheme, while the convenient scheme with the precence of the scale in the expression for the mass is called minimal subtraction. So what's the point?
2. On-shell renormalization scheme: limitations
Let's assume that you use on-shell renormalization scheme. For this scheme $a_{i}$ is not zero, and it is uniquely fixed by specific conditions.
Lets assume the simplest case - scalar theory with self-interaction, and lets concentrate on the mass renormalization. After computing the self-energy by keeping this scheme you have that the propagator is
$$
D^{-1}(p^{2}) = p^{2} - m_{\text{pole}}^{2} - \Sigma (p^{2}),
$$
where $m_{\text{pole}}$ is the physical mass, for which
$$
\tag 1 D^{-1}(m_{\text{pole}}^{2}) = 0
$$
(since it is observable than it doesn't depend on the $\mu$ scale), and $\Sigma (p^{2})$ is self-energy. Eq. $(1)$ expicitly reads
$$
\tag 2 \Sigma (p^{2} = m_{\text{pole}}^{2}) = 0
$$
Also, the requirement that the propagator has the unity residue leads to the statement that
$$
\tag 3 \left(\frac{d\Sigma (p^{2})}{dp^{2}}\right)_{p^{2} = m_{\text{pole}}^{2}} = 0
$$
This condition, let me remind, in fact is nothing but the requirement that the propagator corresponds to the one-particle state.
Note two things about $\Sigma (p^{2})$ in on-shell scheme. The first one is that it doesn't depend on the scale $\mu$ since the mass $m_{\text{pole}}$ is indeed scale independent, and this result, of course, is regularization scheme independent. The second is that the condition $(3)$ can't be satisfied in the limit of $m_{\text{pole}}^{2} = 0$, since in massless limit Callen-Lemman representation of the propagator (which just binds the pole of the Green function with one-particle state) doesn't contain isolated pole: the one-particle state with zero energy isn't different from many-particle states.
We can't deal with this problem without introducing regularizing scale $m_{\text{reg}}^{2}$. It is indeed unphysical, and in general this is the price for obtaining $\mu$-scale independent quantities in theories with massless states. Note that the most of realistic theories are ones with massless states. For example, QED in on-shell prescription suffers from IR divergences in self-energy because of exactly zero photon mass.
To avoid such singularities, we need to change the renormalization scheme.
3. Extra: the minimal subtraction scheme
For this scheme, all $a_{i}$s in Eq. $(0)$ are zero. So that, Eq. $(1)$ now is
$$
D^{-1}(p^{2}) = p^{2} - m^{2} - \Sigma (p^{2}, \alpha , \mu),
$$
where $\alpha$ is the set of other couplings which are present in theory (here they are couplings for cubic, quartic terms).
For $p^{2} = m_{\text{pole}}^{2}$
$$
D^{-1}(m_{\text{pole}}^{2}) = m_{\text{pole}}^{2} - m^{2} - \Sigma (m_{\text{pole}}^{2}(m^{2}, \alpha , \mu), \alpha, \mu) = 0,
$$
or at the lowest order of perturbation theory
$$
m_{\text{pole}}^{2} = m^{2} + \Sigma (m^{2}, \alpha , \mu)
$$
In this scheme $\Sigma$ depends on $\mu$ explicitly. But the $m_{\text{pole}}$ doesn't depend on it, so that we come to the statement that $m^{2}$ and $\alpha$ depend on $\mu$.
Best Answer
The prescription is actually more or less the same as if one uses dimensional regularization, based on the observation that the power divergences can be absorbed by counterterms at all scales is made. In the example you give, $a_n$ is a polynomial of external momenta, such as $p^2$ where $p$ is an external momentum, then just adding a counterterm $p^2\frac{\Lambda}{m}$ is capable of absorbing this divergence at all scales.
Now let me show a general statement: UV divergences from loop integrals will always be proportional to some polynomials of the external momenta. The proof can be found in Weinberg. To demonstrate it, let us see the following concrete example:
\begin{equation} I(p)=\int_0^\infty\frac{k^mdk}{(k+p)^n} \end{equation} For this integral to have power divergences, we require that $m>n$. If we differentiate $I(p)$ $m+2-n$ times with respect to $p$, we get \begin{equation} \frac{d^{m+2-n}I(p)}{dp^{m+2-n}}\propto\int_0^\infty\frac{k^mdk}{(k+p)^{m+2}} \end{equation} which is UV finite, and by dimensional analysis, it must be proportional to $\frac{1}{p}$. Now if we integrate the above equation over $p$ for $m-n+2$ times, we get \begin{equation} I(p)=\sum_{a=0}^{m-n+1}c_ap^a\Lambda^{m-n-1-a}+d\cdot p^{m-n-1}\ln\frac{\Lambda}{p} \end{equation} where $c_a$, $d$ and $\Lambda$ are constants. As we can see in this example, the divergences are always multiplied by polynomials of the external momentum.
The general lesson to take from this example is that, after taking enough derivatives of the external momenta, the loop integrals will be finite and a Laurent polynomial of external momenta, and when we integrate over the external momenta to get the original integrals, we find the divergences are always proportional to some polynomials of external momenta.
Having this in mind, we can always choose local counterterms to cancel power divergences at all scales thus we do not need to worry about them. But logarithmic divergences can only be cancelled at a particular scale. It is this fact that gives the running of the parameters or amplitudes.