[Math] Is a functional derivative a generalized function

functional-analysis

I am just now learning about elementary distribution theory, and it seems that theory may bear on the topic of functional differentiation, which I've encountered in some books on quantum field theory (QFT). I'm not looking for mathematical rigor, but just a bit more information to help me understand functional differentiation.

One definition I've seen of the functional derivative $δJ[f]/δf(x)$ of a functional $J[f]$ is the implicit definition
$$\int_{-\infty}^{+\infty} \frac {δJ[f]}{δf(\xi)} η(\xi) d\xi = \lim_ {ε→0} \frac{F[f+εη]-F[f]}ε.$$
Here $η$ is the variation of the function $f$, which in turn is the argument of the functional $J$. The QFT books seemed to regard the functional derivative as an ordinary function of position x, but it seems to me that it might be a generalized function (distribution); such don't necessarily have numeric values at specific positions x.

  1. Is $δJ[f]/δf(x)$ an ordinary function or a generalized function?
  2. What kind of functions are allowed as variations $η$? It seems to me that maybe—as in distribution theory—one wants to allow only infinitely differentiable functions, perhaps with compact support or perhaps which decrease very fast as one goes to infinity.
  3. One QFT book I looked at exhibited an explicit definition of $δJ[f]/δf(x)$, apparently obtained from the above definition by setting $η(y) = δ(y-x)$ = (Dirac delta "function" representing a unit impulse at x). Is such a legitimate choice for the variation $η$? Certainly the delta function is not an allowed test functions in elementary distribution theory. Other than that, what kind of problems would choosing η(y) = δ(y-x) create?

Best Answer

There's not been any significant response to my questions, but I think I've figured them out. For anyone who's interested:

  1. Let $J$ map ordinary continuous functions $f$, definable pointwise, to real numbers $J[f]$, which is my understanding of a functional $J$. Then $J$'s functional derivative is some generalized function (a.k.a. distribution), usually denoted $\frac {δJ[f]}{δf}$ in the physics textbooks. Let $\langle\Phi,\eta\rangle$ denote the action of a generalized function $\Phi$ on a test function $\eta$. By definition, for each test function $\eta$ one must have
    $$\langle \frac {δJ[f]}{δf},\eta\rangle = \lim_{\epsilon \to 0} \frac {J[f + \epsilon \eta] - J[f]}{\epsilon}.$$ That $J$'s derivative in general must be a generalized function rather than a pointwise-defined function is easily seen from the following simple example: For continuous functions $f$, let $J[f] = f(a)$, where $a$ is some fixed number. Then for each test function $\eta$, $$\langle \frac {δJ[f]}{δf},\eta\rangle = \lim_{\epsilon \to 0} \frac {J[f + \epsilon \eta] - J[f]}{\epsilon} = \lim_{\epsilon \to 0} \frac {[f(a) + \epsilon \eta(a)] - f(a)}{\epsilon} = \eta (a) = \langle{\delta}_a,\eta\rangle,$$ so $\frac {δJ[f]}{δf} = {\delta}_a$ = "the Dirac delta function at $a$". As is well, known, Dirac delta functions are generalized functions incapable of representation as pointwise-defined functions. So functional derivatives cannot be limited to ordinary functions and must instead include the generalized functions.

  2. If a functional derivative $\frac {δJ[f]}{δf}$ is representable as an ordinary function, then its value at $x$ is denoted $\frac {δJ[f]}{δf(x)}$. The action of the functional derivative on a test function may then be written as an integral $$ \langle \frac {δJ[f]}{δf},\eta\rangle = \int_{-\infty}^{+\infty} \frac {δJ[f]}{δf(\xi)} \eta (\xi) d\xi = \lim_{\epsilon \to 0} \frac {J[f + \epsilon \eta] - J[f]}{\epsilon}. $$ As the test functions $\eta$ may be chosen arbitrarily, this constitutes an implicit definition of $\frac {δJ[f]}{δf(x)}$.

  3. Many physics texts substitute the Dirac delta function ${\delta}_x$ in for the test function $\eta$ and claim to thereby obtain an explicit equation $$(*)\qquad \frac {δJ[f]}{δf(x)} = \int_{-\infty}^{+\infty} \frac {δJ[f]}{δf(\xi)} {\delta}_x (\xi) d\xi = \lim_{\epsilon \to 0} \frac {J[f + {\epsilon \delta}_x] - J[f]}{\epsilon}. $$
    for $\frac {δJ[f]}{δf(x)}$. Such a procedure is not legitimate, for test functions must be $C^{\infty}$ ordinary functions, which ${\delta}_x$ is not. Additionally, it doesn't make much sense to vary a continuous function $f$ only at a single point $x$. However one can vary $f$ in an arbitrarily small neighborhood of $x$, and doing so is the key to interpreting $(*)$. Instead of the substitution $\eta = {\delta}_x$, substitute in a sequence of functions $\eta_{x,k}$ which converge (in the sense of generalized functions) to ${\delta}_x$, and then take the limit $\epsilon \to \infty$. (If one wants to be definite, take for the sequence a sequence of Gaussians centered on $x$ with standard deviations shrinking to 0 as $k \to \infty$.) One obtains $$(**)\qquad \frac {δJ[f]}{δf(x)} = \lim_{k \to \infty} \int_{-\infty}^{+\infty} \frac {δJ[f]}{δf(\xi)} {\eta}_{x,k}(\xi)d\xi = \lim_{k \to \infty} \lim_{\epsilon \to 0} \frac {J[f + \epsilon {\eta}_{x,k}] - J[f]}{\epsilon}. $$ This expression suggests three rules which make $(*)$ computationally useful in deducing the functional derivative at $x$ for specific functionals. The first rule reflects the fact that in $(**)$, before either limiting process is carried out, the test function ${\eta}_{x,k}$ is a perfectly well-behaved ordinary function. In $(**)$, the process $\lim_{k \to \infty}$, which effectively transforms ${\eta}_{x,k}$ into ${\delta}_x$, is carried out after the $\lim_{\epsilon \to 0}$ process; this is the source of the second rule. Eq. $(**)$'s last operation is $\lim_{k \to \infty}$; since this effectively transforms ${\eta}_{x,k}$ into ${\delta}_x$, the third rule makes employment of ${\delta}_x$'s special properties the last thing one does.

Rule 1. When computing the right hand side of $(*)$, first compute the difference quotient as if ${\delta}_x$ were a test function. Do not yet invoke properties peculiar to delta functions, such as the sifting property.

Rule 2. After having calculations of the difference quotient as far as permitted under Rule 1, take the limit $\epsilon \to 0$. As one does so, continue to treat ${\delta}_x$ as if it were a test function. In particular, terms such as $\epsilon {\delta}_x$, $\epsilon {\delta}_x^2$, etc. vanish as $\epsilon \to 0$.

Rule 3. Lastly, after having taken calculations as far as permitted by Rules 1 and 2 when computing the right hand side of the physicists' equation, use the properties peculiar to ${\delta}_x$ as a generalized function, such as the sifting property or the integration by parts property $\langle{\Phi}',\eta\rangle = -\langle{\Phi},{\eta}'\rangle$.

These rules for making $(*)$ useful were inspired by some remarks by Walter Greiner in his Field Quantization (Springer-Verlag 1996), pp. 36-39. Those remarks were vague and without any obvious or explicit justification. I hit upon the above explanation in trying to make sense of them. So far $(*)$, when guided by the three rules, has worked quite well as a tool for computing the derivatives of specific functionals.

When computing the derivative of a nonlinear functional, use of $(*)$ without guidance from the three rules almost always turns into a disaster, as crazy things like $\epsilon \delta(0)$ and ${\delta}_x^2$ occur in one's calculations.

Related Question