Integration by parts for a possibly infinite integrand

definite integralsintegrationlebesgue-integralreal-analysis

Let us define the Gaussian density $p(x) := \frac{1}{\sqrt{2 \pi}}e^{- \frac{x^2}{2}}$.
If $f$ is a differentiable function with bounded support, it is not difficult to prove using integration by part formula that,
$$
\int_{- \infty}^{+ \infty} dx f^\prime(x) p(x) = \int_{- \infty}^{+ \infty} dx f(x) x \, p(x)
$$

where $f^\prime$ denotes the first derivative and we used the fact that $p^\prime(x) = -x p(x)$. However, it seems difficult for me to make sense of the same formula for ANY differentiable function $f$, even an arbitrary differentiable function which grows so fast so that one of the two integrals above is infinite. In this case we would need to prove that also the other integral is infinite in order to get the identity. Hence, don't we need some further light assumptions on the asymptotic behaviour of $f$? Does anyone have an hint? I am asking since a book states this identity and claims it holds for any differentiable function…

Best Answer

The exact and precise conditions that are needed for both the sides of the integral to be finite and equal to each other, is provided by Stein's lemma, which also provides a converse. Let $p(t) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{t^2} 2}$.

Let $$\mathcal H = \left\{f : \mathbb R \to \mathbb R : f \text{ is absolutely continuous, and } \int_{\mathbb R} |f'(t)|p(t)dt < \infty \right\}$$ Then for every $f \in \mathcal H$ we have : $$ \int_{\mathbb R} f'(t)p(t)dt = \int_{\mathbb R} tf(t)p(t)dt $$ Conversely, suppose that $q(t)$ is a valid probability density i.e. a non-negative measurable function with $\int_{\mathbb R} q(t)dt = 1$. Then : $$ \int_{\mathbb R} f'(t)q(t)dt = \int_{\mathbb R} tf(t)q(t)dt \text{ for all } f \in H \implies q = p $$ In other words, over the function space $\mathcal H$, the integration by parts formula provides a unique characterization of the Gaussian density.


To give a proof that exactly utilizes all these hypotheses ,we remark that all we need to see is that IBP is a kind of "hidden Fubini theorem" corollary.

$$ \int_{\mathbb R} p(t)f'(t)dt = \int_{0}^\infty f'(t)\left[ \int_{t}^\infty wp(w)dw\right]dt + \int_{-\infty}^0 f'(t)\left[\int_{-\infty}^t wp(w)dw\right]dt $$ because $p(t) = \int_{t}^\infty wp(w)dw$ for $t>0$ and $p(t) = \int_{-\infty}^t wp(w)dw$ for $t<0$ (this is verified by the usual fundamental theorem of calculus, and what $p'(t)$ looks like). Now, the conditions ensure that the Fubini theorem is applicable , since the first integral is finite in absolute value. Thus, we have : $$ \int_{0}^\infty f'(t)\left[ \int_{t}^\infty wp(w)dw\right]dt + \int_{-\infty}^0 f'(t)\left[\int_{-\infty}^t wp(w)dw\right]dt \\ = \int_0^\infty wp(w) \left[\int_0^w f'(t)dt\right]dw + \int_{-\infty}^0 wp(w) \left[\int_w^0 f'(t)dt\right]dw $$

The FTC for the Lebesgue measure tells us that $\int_0^w f'(t)dt = f(w) - f(0)$ a.s. (because $f$ is absolutely continuous), and similarly the other case. Once the two terms involving $f(0)$ cancel out and you combine the rest, you land up exactly with $\int_{-\infty}^\infty wf(w)p(w)dw$, as desired.


Now, often in many sources, the class $H$ is not stated right, or is shortened to a smaller class of functions which admit the "classical" integration-by-parts formulas. To be precise :

Let $f : \mathbb R \to \mathbb R$ be a differentiable function such that $p(x)f(x) \to 0$ as $|x| \to \infty$. Then if one of $\int_{-\infty}^{\infty} f'(x)p(x)dx$ or $\int_{-\infty}^{\infty} xf(x)p(x)dx$ is finite, then so is the other, and they are equal.

The proof would go via the classical integration-by-parts statement.

Let's take some examples to clarify this : see the mathoverflow post here ,it literally says : for any smooth function $f$ , which is wrong because the expectations won't exist for all smooth functions $f$. This paper goes a step further, it literally writes : for any $f$ that is integrable with respect to $p$, we have the conclusion. Which is again wrong, because $f$ could be integrable with respect to $p$ but not absolutely continuous, and even if it were absolutely continuous, that a.e. derivative need not be integrable!

Therefore, it is your source that is confusing you here. Indeed, the equality can be taken in two ways : either you take it in the sense that one being infinite makes the other infinite as well, or you put extra conditions on $f$ to make both finite, as requested.


On the question of finiteness : see, we have already observed that the two sides of the equation, are the left and right hand side of a Fubini application. The important thing about a Fubini application is that if either side is finite, then so does the other. For example, suppose instead that I knew that $\int_{-\infty}^{\infty} |xf(x)|p(x)<\infty$, then I could get the equality I need from knowing that Fubini applies , because the other side is finite.

Basically, both sides are finite or infinite together, provided of course that $f$ is absolutely continuous, in order for both sides to be at least formally definable. (For example, the right hand side could formally exist for $f$ not continuous, while the left hand side can't be defined).


To provide further motivation, there exist multi-dimensional formulas of the form discussed. These are roughly addressed by the "Gaussian integration-by-parts" formula, and also generalize to multi-dimensional Gaussian vectors (that is, not only standard normal). Stein's lemma grows into a method called Stein's method, which was used in the 1990s and 2000s to provide Central Limit Theorems and higher order corrections (think of the Berry-Esseen theorem) in different metrics such as the Wasserstein and Lp distances, using higher moments of the random variables.

The principle behind that is the converse of the lemma, which comes back to the very profound idea of continuity. The normal random variable is the only one satisfying the IBP condition over $\mathcal H$. Stein's idea is basically, that if a density function comes close to satisfying the IBP for functions on $\mathcal H$, then it must be close to the normal density. The precise details can be found in the survey "Fundamentals of Stein's method" by Nathan Ross. In particular, Propositions 2.4,2.5 and Theorem 3.1 contain the details of what I've intuitively tried to say.

Related Question