I think you misunderstood what the professor wanted to say. To understand this, let us evaluate the integral more thoroughly (your expressions contain some mistakes). If we use the dimensional regularization prescription $d\rightarrow d-2\epsilon$ and an additional mass scale $\mu$, we get for the integral in question the following result:
$$\int \frac{d^{d-2\epsilon}p}{(2\pi)^{d-2\epsilon}}\frac{1}{(p^2+\mu^2)^2}=\frac{\Gamma(2-d/2+\epsilon)}{(4\pi)^{d/2-\epsilon}}\mu^{-2(2-d/2+\epsilon)}.$$
For $d=4$ we get
$$\int\frac{d^{4-2\epsilon}p}{(2\pi)^{4-2\epsilon}}\frac{1}{(p^2+\mu^2)^2}=\frac{\Gamma(\epsilon)}{16\pi^2}\left(\frac{\mu^2}{4\pi}\right)^{-\epsilon}.$$
Expanding this at $\epsilon\rightarrow 0,$ we arrive at
$$\int\frac{d^{4}p}{(2\pi)^{4}}\frac{1}{(p^2+\mu^2)^2}\approx\frac{1}{16\pi^2}\left[\frac{1}{\epsilon}-\gamma+\log(4\pi)-\log(\mu^2)\right].$$
In the massless limit, i.e. $\mu\rightarrow 0$, the logarithm diverges. So what can we say about the nature about this divergence?
As can be concluded from powercounting, a positive $\epsilon$ corresponds to curing UV divergences, while a negative one cures IR divergences. First, let us assume that that we deal with UV divergences and identify $\epsilon=\epsilon_{UV}.$ What can we say about the remaining divergent term? We can observe that the whole integral has to vanish (which is proven earlier in the lecture), and this happens only when the divergent term is equal to minus the $1/\epsilon$ term, i.e.
$$\frac{1}{\epsilon_{UV}}=\gamma-\log(4\pi)+\log(\mu^2).$$
Next, let us assume at we are dealing with divergences from the infrared, and identify $\epsilon=\epsilon_{IR}.$ We now have to observe that evaluating the integral gives us just the same result, but with $\epsilon_{UV}$ and $\epsilon_{IR}$ exchanged. The condition for vanishing of the integral is now
$$\frac{1}{\epsilon_{IR}}=\gamma-\log(4\pi)+\log(\mu^2).$$
But the right hand side is just the same as in the condition for the UV! This means we actually get
$$\epsilon_{UV}=\epsilon_{IR}.$$
As the lecturer has pointed out, this can be interpreted as dimensional regularization "taming" both the UV and the IR simultaneously.
1) Is this (the second type of counterterms) the only way that power
divergences will be cancelled?
Yes, you need to add counter terms that look like the divergences you find. So in this case, you are exactly right, you need to add a counter term corresponding to $\phi^2 \partial(\phi)^2$.
2) If yes, are we always able to set counterterms with respect to the
symmetries of the system in this way, so that power divergences can
always be cancelled at all scales (then we do not have to worry about
power divergences at all!)?
I'm not sure what symmetry you are worried about here, but often the power law divergences will break the symmetries you started with and will require you to add counterterms that break the symmetry you started with. In fact, this is precisely why people tend to focus on the logs, or use regularization methods like dim reg, where the power laws don't appear and you don't have this annoying problem.
The general principle is that you will need counterterms that violate the symmetry you started with, if your regularization method itself breaks the symmetry. An example is QED. Regulating with a hard cutoff violates gauge invariance, so the power law divergences also violate gauge invariance. (Just compute the one loop correction to the two point function for the photon, gauge invariance tells you the propagator should be proportional to $\eta_{\mu\nu}-p_\mu p_\nu$, but the power law divergences will give different relative contributions to those two terms).
The best solution is to use a regularization scheme that respects all the symmetries. Dim reg is usually a good choice. If you can't find such a regularization scheme, that may be a sign that the symmetry is anamolous and can't be maintained at the quantum level (eg--massless fermions have a chiral symmetry that is anamolous when you couple them to gauge fields).
If you insist on using a different regularization scheme, you can still get the right answer, but it will take a little more work. You will need to add counterterms that violate the symmetry. However the real question is that when you compute something physical (such as the S matrix), after you renormalize, will the physical quantities respect the symmetry? You will end up finding, if you do things correctly, that the failure of the counterterms to respect the symmetry will exactly cancel the failure of the divergences to respect the symmetry, and the final answer will be symmetric. I would consult Weinberg if you want to get a precise prescription on how to proceed.
The lesson is that to renormalize, you simply add the counterterms you need to cancel the divergences you find, without thinking about what they mean. Later on, you may need to do some re-interpreting to figure out exactly what the physics is.
Also, it is true that this interaction is renormalizable, so you will end up generating an infinite number of counterterms. However, depending on what you are trying to do, that's not necessarily as bad as it sounds.
Best Answer
What Gross means is that QCD is well defined in the ultraviolet, so that if you take a lattice version and send the lattice spacing to zero, there is no divergence in the coupling as you take the lattice spacing small. Instead, the coupling goes to zero as the inverse logarithm of the lattice spacing, so very slowly.
This doesn't mean that QCD perturbation theory doesn't have ultraviolet divergences, it has those like any other unitary interacting field theory in 4d. These ultraviolet divergences though are not a sign of a problem with the theory, since the lattice definition works fine. This is in contrast to, say, QED, where the short lattice spacing limit requires the bare coupling to blow up, and it is likely that the theory blows up to infinite coupling at some small but finite distance. This is certainly what happens in the simplest interacting field theory, the quartically self-interacting scalar.
There is no proof that the limit of small lattice spacing gives a proper continuum limit for QCD, but the difficulties are of a stupid technical nature--- there is absolutely no doubt that it is true. The full proof will require a better handle on the best way to define continuum limits for statistical fluctuating fields within mathematics.