Yes, the infrared divergences are guaranteed to be present in any meaningful computational scheme for a loop diagram, including dimensional regularization, because their existence is an objective question.
In dim reg, UV divergent integrals have to be calculated in $4-\epsilon$ dimensions so we imagine that the dimension is lower than the dimension of interest (or the dimension where the integral is divergent for the first time).
Similarly, IR divergences become worse in lower dimensions, so dim reg means that we imagine that we work in $4+\epsilon$ dimensions, or above the dimension where they get divergent. If both divergences are present, we must separate the integration interval to IR and UV portions and treat them independently, in a dimension that is oppositely moved away from four.
Dim reg should generally lead to simple poles only, $1/\epsilon$. The exponent above $\epsilon$, which is generally $-1$ for all divergent integrals in dim reg, isn't the same exponent that determines the "degree" of the divergence as measured in the momentum space.
The UV and IR divergences are treated analogously in dim reg except that we imagine the dimension to be lowered or raised, respectively. However, their physical interpretations are completely different. UV divergences imply the need to complete the definition of the theory at short distances. IR divergences exist even if the theory is completely well-defined and they encode genuine physics. They seem pathological but this should be blamed on the person who was asking the question. If he asks a more physical question that can actually be measured, the IR divergences cancel. So the presence of IR divergences doesn't mean that we should add counterterms with parameters whose values will matter.
The right treatment of the theory – renormalization – requires the cancellation of the $1/\epsilon$ terms in dim reg: they're analogous to $\log \Lambda$ terms in some cutoff treatments and have the same interpretation. So we're adding counterterms. You say that "all evidence of counterterms vanishes in the final answer" but this is a misleading interpretation. The more correct comment is that "all evidence of counterterms and divergent loop contributions constructed from the original interaction vertices vanishes in the final answer". They cancel. They're equally important. It's a mistake to forget one of them because it would lead you to a completely incorrect belief that the counterterms are unnecessary. They're necessary to cancel the divergences that were there to start with, before counterterms were added. The coefficients of the counterterms have an infinite part that cancels but the finite remainder survives and produces a genuine finite parameter.
Dim reg is a regularization that doesn't introduce any cutoff scale $\Lambda$ – indeed, divergent terms in the limit $\Lambda\to\infty$ are exactly those that we wanted to avoid by using a different scheme – but its results include $1/\epsilon$ terms that are analogous to $\log\Lambda$ in a cutoff approach. The power law divergent diagrams $\Lambda^n$ from a cutoff approach typically lead to convergent integrals in dim reg but their presence in the cutoff approach still implies a sensitivity on the detailed physics at a high-energy scale.
While dim reg has no $\Lambda$, it needs the scale $\mu$ to be added to the integrals over the momentum space for simple dimensional reasons (right units). We don't need to imagine that $\mu$ is as high as $\Lambda$ or that it is a cutoff scale; it may be a scale close to the mass scales where interesting phenomena occur in the theory.
Regardless of the choice of the regularization technique, the right way to think about fine-tuning etc. is to think about the sensitivity of the low-energy physics on the detailed values of the parameters chosen for the parameters at the high-energy scale. This dependence on dynamical parameters in the high-energy theory is also the correct invariant explanation of "what is actually wrong about the UV divergences". It's not the infinities themselves; what's bad is the infinite number of unknown parameters one may choose for the high-scale theory in a non-renormalizable theory.
I believe that the paragraphs above answer (almost?) all the questions by the OP but I chose not to reproduce the precise wording because it sometimes seemed confusing to me, encouraging the answer to be phrased in framework that isn't quite right.
1) Is this (the second type of counterterms) the only way that power
divergences will be cancelled?
Yes, you need to add counter terms that look like the divergences you find. So in this case, you are exactly right, you need to add a counter term corresponding to $\phi^2 \partial(\phi)^2$.
2) If yes, are we always able to set counterterms with respect to the
symmetries of the system in this way, so that power divergences can
always be cancelled at all scales (then we do not have to worry about
power divergences at all!)?
I'm not sure what symmetry you are worried about here, but often the power law divergences will break the symmetries you started with and will require you to add counterterms that break the symmetry you started with. In fact, this is precisely why people tend to focus on the logs, or use regularization methods like dim reg, where the power laws don't appear and you don't have this annoying problem.
The general principle is that you will need counterterms that violate the symmetry you started with, if your regularization method itself breaks the symmetry. An example is QED. Regulating with a hard cutoff violates gauge invariance, so the power law divergences also violate gauge invariance. (Just compute the one loop correction to the two point function for the photon, gauge invariance tells you the propagator should be proportional to $\eta_{\mu\nu}-p_\mu p_\nu$, but the power law divergences will give different relative contributions to those two terms).
The best solution is to use a regularization scheme that respects all the symmetries. Dim reg is usually a good choice. If you can't find such a regularization scheme, that may be a sign that the symmetry is anamolous and can't be maintained at the quantum level (eg--massless fermions have a chiral symmetry that is anamolous when you couple them to gauge fields).
If you insist on using a different regularization scheme, you can still get the right answer, but it will take a little more work. You will need to add counterterms that violate the symmetry. However the real question is that when you compute something physical (such as the S matrix), after you renormalize, will the physical quantities respect the symmetry? You will end up finding, if you do things correctly, that the failure of the counterterms to respect the symmetry will exactly cancel the failure of the divergences to respect the symmetry, and the final answer will be symmetric. I would consult Weinberg if you want to get a precise prescription on how to proceed.
The lesson is that to renormalize, you simply add the counterterms you need to cancel the divergences you find, without thinking about what they mean. Later on, you may need to do some re-interpreting to figure out exactly what the physics is.
Also, it is true that this interaction is renormalizable, so you will end up generating an infinite number of counterterms. However, depending on what you are trying to do, that's not necessarily as bad as it sounds.
Best Answer
Question 1
Nontrivial RG flow is the result of explicit breaking of classical theory scale invariance in corresponding quantum field theory. If there is no dimensionful parameters in classical lagrangian of corresponding theory (the generalization on the presence of masses is straightforward, but is not relevant here), we naively expect that after scaling transformation, $$ \Phi (x) \to e^{\sigma \epsilon}\Phi (e^{\epsilon}x), \quad x \to e^{\epsilon}x, $$ with $\sigma$ being the canonical dimension of $\Phi$ field and $\epsilon$ being the continuous parameter of transformation, the action will be unchanged. Corresponding symmetry defines dilatation current conservation law: $$ \partial_{\mu}D^{\mu} =0 $$ This naive law is completely broken by infinities of QFT (such breaking is called the trace anomaly), because of which regularization enters the game. I.e, we introduce the dimensionful parameter by hand, and initial free from dimensionful parameters lagrangian begins to contain the one, called $\mu$. Since it is dimensionful, it is called the scale of theory. However, it is unphysical, and we can't say that it is the scale on which theory is defined: for example, we can choose it so that it coincides with square of transfer momentum, but it is only formal correspondence which depends on our wish.
In general, because of the presence of scale the dilatation current conservation law is modified by quantum corrections. For example, for massless QED $$ \partial_{\mu}D^{\mu} \sim \beta (\alpha)F_{\mu \nu}F^{\mu \nu}, \quad \alpha \equiv \frac{e^{2}}{4 \pi} $$ This leads to nontrivial behaviour of lagrangian parameters (like couplings) with changing of $\mu$.
What's about your question, $\mu$ as the scale on which theory is defined? The answer is dimensional transmutation phenomena, which occurs because of described above breaking of scaling invarince. Precisely, by solving RG equation (here $\alpha$ is the running coupling) $$ \mu\frac{d\alpha}{d\mu} = \beta (\alpha (\mu)) $$ we obtain that $$ \alpha (\mu) = f(\mu , \mu_{0}, \alpha (\mu_{0})) $$ We can invert this relation and use dimensionful parameter $\mu_{0}$ instead of $\alpha $ in perturbation theory (and this is what people call the dimensional transmutation). Such parameter is really physical scale: it defines set of theory parameters. For example, for QCD it defines the strong coupling scale, which is closely related to the confinement and chiral symmetry breaking scale $\Lambda_{QCD}$. The latter determines the scale at which effective theory which describes hadrons interactions works.
Question 2
1. General remarks
The scheme of renormalization precisely defines renormalization constants, including their finite part. In general, the renormalization constants are given as (for example, for dimensional regularization) $$ \tag 0 Z_{i} = a_{i} + \sum_{j = 1}\frac{c^{(i)}_{j}}{\epsilon^{j}}, $$ We have the freedom for choosing $a_{i}$, while $c^{(i)}_{j}$ are completely fixed by the structure of infinities in theory. The renormalization group states that the scheme dependence of the physical observables is absent.
Your question is following: suppose that we have specifit renormalization scheme for which the scale parameter $\mu$ doesn't affect parameters of theory - particularly, the mass parameter, which is fixed by the pole of propagator - why do we introduce the other scheme, for which the mass becomes to run and the RG equations enter the game?
The specific renormalization scheme is called on-shell scheme, while the convenient scheme with the precence of the scale in the expression for the mass is called minimal subtraction. So what's the point?
2. On-shell renormalization scheme: limitations
Let's assume that you use on-shell renormalization scheme. For this scheme $a_{i}$ is not zero, and it is uniquely fixed by specific conditions.
Lets assume the simplest case - scalar theory with self-interaction, and lets concentrate on the mass renormalization. After computing the self-energy by keeping this scheme you have that the propagator is $$ D^{-1}(p^{2}) = p^{2} - m_{\text{pole}}^{2} - \Sigma (p^{2}), $$ where $m_{\text{pole}}$ is the physical mass, for which $$ \tag 1 D^{-1}(m_{\text{pole}}^{2}) = 0 $$ (since it is observable than it doesn't depend on the $\mu$ scale), and $\Sigma (p^{2})$ is self-energy. Eq. $(1)$ expicitly reads $$ \tag 2 \Sigma (p^{2} = m_{\text{pole}}^{2}) = 0 $$ Also, the requirement that the propagator has the unity residue leads to the statement that $$ \tag 3 \left(\frac{d\Sigma (p^{2})}{dp^{2}}\right)_{p^{2} = m_{\text{pole}}^{2}} = 0 $$ This condition, let me remind, in fact is nothing but the requirement that the propagator corresponds to the one-particle state.
Note two things about $\Sigma (p^{2})$ in on-shell scheme. The first one is that it doesn't depend on the scale $\mu$ since the mass $m_{\text{pole}}$ is indeed scale independent, and this result, of course, is regularization scheme independent. The second is that the condition $(3)$ can't be satisfied in the limit of $m_{\text{pole}}^{2} = 0$, since in massless limit Callen-Lemman representation of the propagator (which just binds the pole of the Green function with one-particle state) doesn't contain isolated pole: the one-particle state with zero energy isn't different from many-particle states.
We can't deal with this problem without introducing regularizing scale $m_{\text{reg}}^{2}$. It is indeed unphysical, and in general this is the price for obtaining $\mu$-scale independent quantities in theories with massless states. Note that the most of realistic theories are ones with massless states. For example, QED in on-shell prescription suffers from IR divergences in self-energy because of exactly zero photon mass.
To avoid such singularities, we need to change the renormalization scheme.
3. Extra: the minimal subtraction scheme
For this scheme, all $a_{i}$s in Eq. $(0)$ are zero. So that, Eq. $(1)$ now is $$ D^{-1}(p^{2}) = p^{2} - m^{2} - \Sigma (p^{2}, \alpha , \mu), $$ where $\alpha$ is the set of other couplings which are present in theory (here they are couplings for cubic, quartic terms).
For $p^{2} = m_{\text{pole}}^{2}$ $$ D^{-1}(m_{\text{pole}}^{2}) = m_{\text{pole}}^{2} - m^{2} - \Sigma (m_{\text{pole}}^{2}(m^{2}, \alpha , \mu), \alpha, \mu) = 0, $$ or at the lowest order of perturbation theory $$ m_{\text{pole}}^{2} = m^{2} + \Sigma (m^{2}, \alpha , \mu) $$ In this scheme $\Sigma$ depends on $\mu$ explicitly. But the $m_{\text{pole}}$ doesn't depend on it, so that we come to the statement that $m^{2}$ and $\alpha$ depend on $\mu$.