[Math] Can distribution theory be developed Riemann-free

fa.functional-analysisintegrationschwartz-distributions

I imagine most people who frequent MO have been indoctrinated into the point of view that the Riemann integral can be safely discarded once one has taken the time to develop the Lebesgue integral. After all the two integrals agree more or less whenever they are both defined, and the Lebesgue theory is well known to be more robust and flexible in a lot of important ways.

However, I have recently encountered an apparent counter-example to the extreme view (which perhaps nobody actually holds) that the Riemann integral is entirely dispensable as a technical tool. The context is the theory of distributions. It is not uncommon that when one wants to generalize an operation from test functions to distributions that there are two natural choices: the operation can either be defined "directly" or by specifying how it pairs with test functions. Here are two basic examples:

  • The first example involves the convolution of a distribution $F$ with a test function $\psi$. The direct definition is given by $F \ast \psi(x) = \langle F, \psi_x \rangle$ where $\psi_x(y) = \psi(x-y)$. The definition by pairing stipulates that for any test function $\phi$, $\langle F \ast \psi, \phi \rangle = \langle F, \phi \ast \psi_0 \rangle$.
  • The second example involves the Fourier transform of a (tempered) distribution $F$. The direct definition is given by $\hat{F}(\xi) = \langle F, e_\xi \rangle$ where $e_\xi(x) = e^{2 \pi i \xi x}$. The definition by pairing just sets $\langle \hat{F}, \psi \rangle = \langle F, \hat{\psi} \rangle$ for any appropriate test function $\psi$.

In both of these examples, and others like them, all of the authors that I have consulted (including Folland and Taylor) prove that the direct definition agrees with the definition by pairing by carrying out a calculation with Riemann sums.

So I am left wondering if there decent proofs of these results for ordinary Lebesgue-abiding citizens. This question is a little problematic since the Lebesgue integral and the Riemann integral agree on the relevant space of functions, but if there isn't a good affirmative answer then it seems to me that there should be a convincing explanation why measure theoretic tools aren't strong enough to make the argument work.

Best Answer

I will only consider the temperate situation (involving the spaces $\mathscr{S}$, $\mathscr{S}'$ and $\mathscr{O}_{\rm M}$) and I will only discuss the first example as well as a "dual" version with a derivative instead of an integral to be commuted with a distribution pairing. I will show that Riemann integration (and its dual, i.e., the classical definition of derivative as the limit of a ratio) not only is unnecessary but perhaps is a crime against Laurent Schwartz, or rather against the spirit of his theory of distributions.

Let us first go over some notational conventions. When talking about a distribution $F\in\mathscr{S}'(\mathbb{R}^n)$, I will instead say $F(x)\in\mathscr{S}'_x(\mathbb{R}^n)$ in order to view it as generalized function $F(x)$ of a variable $x$ that must be named. This name will also appear as a subscript in the corresponding function space. In the same vein, let me now take a Schwartz function $\phi(x)\in\mathscr{S}_x(\mathbb{R}^n)$ meaning let me take an element $\phi\in\mathscr{S}(\mathbb{R}^n)$ but with the extra emphasis on giving the variable a name, here $x$. I will not denote the distribution pairing by $\langle F,\phi\rangle$ which is rigorous but lacks expressive power, nor by the integral $$ \int_{\mathbb{R}^n}F(x)\phi(x)\ d^nx $$ which is not rigorous, but has the expressive power needed for "multilinear algebra" in infinite dimension. Instead, I will keep the best of both worlds and write $$ \langle F(x),\phi(x)\rangle_{x}\ . $$

These conventions settled, I can state the temperate variant of Fubini's Theorem for Distributions. Let $F(x)\in\mathscr{S}'_x(\mathbb{R}^m)$, let $G(y)\in\mathscr{S}'_y(\mathbb{R}^n)$, and let $\eta(x,y)\in\mathscr{S}_{x,y}(\mathbb{R}^{m+n})$. Then

a) $\langle G(y),\eta(x,y)\rangle_{y}\in\mathscr{S}_x(\mathbb{R}^m)$;

b) $\langle F(x),\eta(x,y)\rangle_{x}\in\mathscr{S}_y(\mathbb{R}^n)$;

c) we have the equalities $$ \langle F(x),\langle G(y), \eta(x,y)\rangle_{y}\rangle_{x} =\langle G(y),\langle F(x), \eta(x,y)\rangle_{x}\rangle_{y} =\langle (F\otimes G)(x,y),\eta(x,y)\rangle_{x,y} $$ where $(F\otimes G)(x,y)$ heuristically is the generalized function $F(x)G(y)$ of the composite variable $(x,y)$ in $\mathbb{R}^{m+n}$.

One also has a stronger version of the theorem which includes:

d) The element of $\mathscr{S}_{x}(\mathbb{R}^{m})$ constructed in a) is a hypocontinuous bilinear function of $G$ in $\mathscr{S}'_{y}(\mathbb{R}^{n})$ and $\eta$ in $\mathscr{S}_{x,y}(\mathbb{R}^{m+n})$. Likewise, the element of $\mathscr{S}_{y}(\mathbb{R}^{n})$ constructed in b) is a hypocontinuous bilinear function of $F$ in $\mathscr{S}'_{x}(\mathbb{R}^{m})$ and $\eta$ in $\mathscr{S}_{x,y}(\mathbb{R}^{m+n})$.

Now the OP's first example asks us to show that for $F$ a temperate distribution in $\mathbb{R}^n$ and for $\psi,\phi$ Schwartz functions on $\mathbb{R}^n$, we have the identity $$ \int_{\mathbb{R}^n}\langle F(y),\psi(x-y)\rangle_{y}\ \phi(x)\ d^n x =\langle F(y),(\phi\ast\psi_0)(y)\rangle_{y} $$

Low tech proof: Use finite Riemann sums to approximate the integral which reduces the question to a trivial algebraic fact, i.e., linearity.

High tech proof: In my previous notations, the identity means $$ \langle 1(x),\langle F(y),\phi(x)\psi(x-y)\rangle_{y}\rangle_x =\langle F(y),\langle 1(x),\phi(x)\psi(x-y)\rangle_{x}\rangle_y $$ where the strange notation $1(x)$ is for the constant function of $x$ equal to 1 seen as a temperate distribution in the usual way (by Lebesgue integrating against it). Since $\eta(x,y)=\phi(x)\psi(x-y)\in\mathscr{S}_{x,y}(\mathbb{R}^{2n})$, this trivially follows from the algebraic fact/formula expressed by the above distributional Fubini theorem, part c).


Although not in the original question, let me also consider another "Stokes-Theorem/Fundamental-Theorem-of-Calculus"-dual result that everyone learns in a PDE course with distributions, regarding convolutions. In the same setting as above, we have that $(F\ast\psi)(x)$ is a smooth function of temperate growth which I will write as the statement $(F\ast\psi)(x)\in\mathscr{O}_{{\rm M},x}(\mathbb{R}^n)$.

Low tech proof: One uses induction, starting with a similar commutation identity, i.e., derivation under the distibutional pairing sign, in order for the derivative to hit the test function inside the pairing. Usually, one proves this by tedious estimates on derivatives, etc.

High tech proof: Take an arbitrary $\phi(x)\in \mathscr{S}_{x}(\mathbb{R}^{n})$ and consider $$ \phi(x)(F\ast\psi)(x)=\phi(x) \langle F(y),\psi(x-y)\rangle_{y}=\langle F(y),\phi(x)\psi(x-y)\rangle_{y}\ . $$ Again because $\eta(x,y)=\phi(x)\psi(x-y)\in\mathscr{S}_{x,y}(\mathbb{R}^{2n})$, the distributional Fubini, part a), with $F$ instead of $G$ gives that $$ \phi(x)(F\ast\psi)(x)\in \mathscr{S}_{x}(\mathbb{R}^n)\ . $$ This is true for all $\phi$, so by the multiplier space characterization of $\mathscr{O}_{\rm M}$, we have that the convolution is a smooth temperate function.

Remark 1: One can show the intermediate identity $\partial_z^\alpha (F\ast \psi)(z)=(F\ast\partial^{\alpha}\psi)(z)$ used in the low tech proof, with the same method as above by replacing $1(x)$ by a derivative of a delta function in $x$ located at $z$.

Remark 2: This is pure algebra.


Addendum: The map $\mathscr{S}'\times\mathscr{S}\rightarrow\mathscr{O}_{\rm M}$, $(F,\psi)\mapsto (F\ast \psi)$ is hypocontinuous. This follows from the newly added part d) above of Fubini's Theorem for distributions and from the topology of $\mathscr{O}_{\rm M}$ being defined by the seminorms $||\phi \cdot||$ where $\phi$ ranges over $\mathscr{S}$ and $||\cdot||$ ranges over continuous seminorms of $\mathscr{S}$. Since convergent sequences form bounded sets, this shows the construction is jointly sequentially continuous.

Also note that proving Fubini's Theorem for distributions (and the Kernel Theorem etc.) is very easy, if one does what algebraists say one should never do: take a (Schauder) basis, e.g., that of Hermite functions. Some hints about this are given in my answer

https://math.stackexchange.com/questions/3512357/understanding-the-proof-of-schwartz-kernel-theorem/3512932#3512932

with some more pointers given in

https://math.stackexchange.com/questions/2623515/schwartz-kernel-theorem-and-dual-topologies/2647815#2647815

Finally, too see more involved applications of this kind of reasoning with distributions, you can also have a look at my recent article "A Second-Quantized Kolmogorov–Chentsov Theorem via the Operator Product Expansion" about pointwise multiplication for random Schwartz distributions.

Related Question