Field theories are nonlinear because the quantum fields satisfy nonlinear dynamical equations.
But renormalization does not make quantum fields into a nonlinear functional of test functions. The Wightman distributions are, by definition, linear functionals of the test functions, and Wightman distributions always encode renormalized fields.)
Instead it changes the space of test functions to one where the interacting quantum fields are perturbatively well-defined. This gives a family of representations of the field algebra depending on an energy scale. All these representations are equivalent, due to the renormalization group, and the corresponding Wightman functions are independent of the renormalization energy. (In simpler, exactly solvable toy examples that need infinite renormalization, this can actually be checked.)
The dependence on the energy scale would not be present if contributions to all ordered were summed up (though nobody has the slightest idea how to do this nonperturbative step). The energy scale is simply a redundant parameter the influences the approximations calculated by perturbation theory.
The renormalization group is an exact but unobservable symmetry (just like gauge symmetry) that removes this extra freedom, but as computations in a fixed gauge may spoil gauge-independence numerically, so computations at a fixed energy scale spoil renormalization group invariance numerically.
Note that Wightman functions are in principle observable. Indeed, the Kadanoff-Baym equations, the equations modeling high energy heavy ion collision experiments. are dynamical equations for the 2-particle Wightman functions and their ordered analoga.
[added 22.01.2018] In the above, the renormalization group refers to the group defined by StĂșckelberg an Bogoliubov, not to that by Kadanoff and Wilson, which is only a semigroup. See here.
First, I will translate the relevant passages in your paper from mathematese.
The argument in your reference
You are studying an X-Y model with the constraint that neighboring spins have to always be within a certain angle of each other. You define the collection of statistical-mechanics Gibbs distributions using a given boundary condition at infinity, as the boundaries get further and further away. Then you note that if the field at the boundary makes the spin turn around from top to bottom the maximum possible amount, then the spins are locked in place--- they can't move, because they need to make a certain winding, and they unless they are at the maximum possible angle, they can't make the winding.
Using these boundary conditions, there is no free energy, there is no thermodynamics, there is no spin-wave limit, and the Mermin Wagner theorem fails.
You also claim that the theorem fails with a translation invariant measure, which is just given by averaging the same thing over different centers. You attempt to make the thing more physical by allowing the boundary condition to fluctuate around the mean by a little bit $\delta$. But in order to keep the boundary winding condition tight, as the size of the box $N$ goes to infinity, $\delta$ must shrink as $1\over N$, and the resulting Free energy of your configuration will always be subextensive in the infinite system limit. If $\delta$ does not shrink, the configurations will always randomize their angles, as the Mermin-Wagner theorem says.
The failures of the Mermin-Wagner theorem are all coming from this physically impossible boundary situation, not really from the singular potentials. By forcing the number of allowed configurations to be exactly 1 for all intents and purposes, you are creating a situation where each different average value of the angle has a completely disjoint representative in the thermodynamic limit. This makes the energy as a function of the average angle discontinuous (actually, the energy is infinite except for near one configuration), and makes it impossible to set up spin waves.
This type of argument has a 1d analog, where the analog to Mermin-Wagner is much easier to prove.
1-dimensional mechanical analogy
To see that this result isn't Mermin-Wagner's fault, consider the much easier one-dimensional theorem--- there can be no 1d solid (long range translational order). If you make a potential between points which is infinite at a certain distance D, you can break this theorem too.
What you do is you impose the condition that there are N particles, and the N-th particle is at a distance ND from the first. Then the particles are forced to be right on the edge of the infinite well, and you get the same violation: you form a 1d crystal only by imposing boundary conditions on a translation invariant potential.
The argument in 1d that there can be no crystal order comes from noting that a local defect will shift the average position arbitrarily far out, so as you add more defects, you will wash out the positional order.
Mermin-Wagner is not affected
The standard arguments for the Mermin-Wagner theorem do not need modification. They are assuming that there is an actual thermoodynamic system, with a nonzero extensive free energy, an entropy proportional to the volume, and this is violated by your example. The case of exactly zero temperature is also somewhat analogous--- it has no extensive entropy, and at exactly zero temperature, you do break the symmetry.
If you have an extensive entropy, there is a marvelous overlap property which is central to how physicists demonstrate the smoothness of the macroscopic free-energy. The Gibbs distribution at two angles infinitesimally separated sum over almost the same exact configurations (in the sense that for a small enough angle, you can't tell locally that it changed, because the local fluctuations swamp the average, so the local configurations don't notice)
The enormous, nearly complete, overlap between the configurations at neighboring angles demonstrates that the thermodynamic average potentials are much much smoother than the possibly singular potentials that enter into the microscopic description. You always get a quadratic spin-wave density, including in the case of the model you mention, whenever you have an extensive free energy.
Once you have a quadratic spin-wave energy, the Mermin Wagner theorem follows.
Quick answer
the Gibbs distributions for orientation $\theta$ and the Gibbs distributions for orientation $\theta'$ always include locally overlapping configurations as $\theta$ approaches $\theta'$. This assumption fails in your example, because even an infinitesimal change in angle for the boundary condition changes the configurations completely, because they do not have extensive entropy, and are locked to within a $\delta$, shrinking with system size, of an unphysically constrained configuration.
Best Answer
I think, the most transparent example is phase transition: by definition it is when some thermodynamic value does not behave well.
AFAIK when Fourier showed that non-continuous function may be presented as an infinite sum of continuous, he had a hard time convincing people around that he is not crazy. That story might partially answer your question: as long as any not-so-well-behaved function may be presented as a sum of smooth ones, there is no much difference as long as good formulated laws are linear. Functions which are really bad behaved usually do not appear in real problems. If they do, there is some significant physics behind it (as with phase transition, shock wave, etc.) and one can not miss it.
For an operator it is better (for physicist) to think of function from operator as a function acting on its eigenvalues (if it is not diagonalizable, in physics it is bad behaviour). This is equivalent to power series definition, but works for any function.