You're totally right. The Wikipedia definition of the renormalization is obsolete i.e. it refers to the interpretation of these techniques that was believed prior to the discovery of the Renormalization Group.
While the computational essence (and results) of the techniques hasn't changed much in some cases, their modern interpretation is very different from the old one. The process of guaranteeing that results are expressed in terms of finite numbers is known as the regularization, not renormalization, and integrating up to a finite cutoff scale only is a simple example of a regularization.
However, the renormalization is an extra step we apply later in which a number of calculated quantities is set equal to their measured (and therefore finite) values. This of course cancels the infinite (calculated) parts of these quantities (I mean parts that were infinite before the regularization) but for renormalizable theories, it cancels the infinite parts of all physically meaningful predictions, too.
However, the renormalization has to be done even in theories where no divergences arise. In that case, it still amounts to a correct (yet nontrivial) mapping between the observed parameters and the "bare" parameters of the theory.
The modern, RG-based interpretation of these issues changes many subtleties. For example, the problem with the non-renormalizable theory is no longer the impossibility to cancel the infinities. The infinities may still be regulated away by a regularization but the real problem is that we introduce an infinite number of undetermined finite parameters during the process. In other words, a non-renormalizable theory becomes unpredictive (infinite input is needed to make it predictive) for all questions near (and above?) its cutoff scale where its generic interactions (higher-order terms) become strongly coupled.
1) Is this (the second type of counterterms) the only way that power
divergences will be cancelled?
Yes, you need to add counter terms that look like the divergences you find. So in this case, you are exactly right, you need to add a counter term corresponding to $\phi^2 \partial(\phi)^2$.
2) If yes, are we always able to set counterterms with respect to the
symmetries of the system in this way, so that power divergences can
always be cancelled at all scales (then we do not have to worry about
power divergences at all!)?
I'm not sure what symmetry you are worried about here, but often the power law divergences will break the symmetries you started with and will require you to add counterterms that break the symmetry you started with. In fact, this is precisely why people tend to focus on the logs, or use regularization methods like dim reg, where the power laws don't appear and you don't have this annoying problem.
The general principle is that you will need counterterms that violate the symmetry you started with, if your regularization method itself breaks the symmetry. An example is QED. Regulating with a hard cutoff violates gauge invariance, so the power law divergences also violate gauge invariance. (Just compute the one loop correction to the two point function for the photon, gauge invariance tells you the propagator should be proportional to $\eta_{\mu\nu}-p_\mu p_\nu$, but the power law divergences will give different relative contributions to those two terms).
The best solution is to use a regularization scheme that respects all the symmetries. Dim reg is usually a good choice. If you can't find such a regularization scheme, that may be a sign that the symmetry is anamolous and can't be maintained at the quantum level (eg--massless fermions have a chiral symmetry that is anamolous when you couple them to gauge fields).
If you insist on using a different regularization scheme, you can still get the right answer, but it will take a little more work. You will need to add counterterms that violate the symmetry. However the real question is that when you compute something physical (such as the S matrix), after you renormalize, will the physical quantities respect the symmetry? You will end up finding, if you do things correctly, that the failure of the counterterms to respect the symmetry will exactly cancel the failure of the divergences to respect the symmetry, and the final answer will be symmetric. I would consult Weinberg if you want to get a precise prescription on how to proceed.
The lesson is that to renormalize, you simply add the counterterms you need to cancel the divergences you find, without thinking about what they mean. Later on, you may need to do some re-interpreting to figure out exactly what the physics is.
Also, it is true that this interaction is renormalizable, so you will end up generating an infinite number of counterterms. However, depending on what you are trying to do, that's not necessarily as bad as it sounds.
Best Answer
The standard nonperturbative way (that provided rigorous constructions in 1+1D and 1+2D QFTs) is constructing the Euclidean (imaginary time) field theory as a limit of lattice theories, and then using analytic continuation to real time via the Osterwalder--Schrader theorem.
In 1+3D, there is so far no rigorous construction of an interacting QFT, but neither is there a corresponding no-go theorem.
In 1+1D, there are also lots of exactly solvable QFTs, where the nonperturbative solution is obtained by the quantum inverse scattering method.
http://en.wikipedia.org/wiki/Quantum_inverse_scattering_method