We Can Think of the Dirac Delta Function as Being the Limit Point of a Series of Functions That Put Less and Less Mass On All Points Other Than Zero

dirac deltafunctional-analysisprobability distributionsreal-analysissequences-and-series

When describing the Dirac delta function and Dirac distribution, my textbook says the following:

In some cases, we wish to specify that all the mass in a probability distribution clusters around a single point. This can be accomplished by defining a PDF using the Dirac delta function, $\delta(x)$:

$$p(x) = \delta(x – \mu)$$

The Dirac delta function is defined such that it is zero valued everywhere except $0$, yet integrates to $1$. The Dirac delta function is not an ordinary function that associates each value $x$ with a real-valued output; instead it is a different kind of mathematical object called a generalized function that is defined in terms of its properties when integrated. We can think of the Dirac delta function as being the limit point of a series of functions that put less and less mass on all points other than zero.

By defining $p(x)$ to be $\delta$ shifted by $- \mu$ we obtain an infinitely narrow and infinitely high peak of probability mass where $x = \mu$.

Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron; Bach, Francis. Deep Learning (NONE) (Page 64). The MIT Press. Kindle Edition.

I'm not sure that I understand what is meant by this part:

We can think of the Dirac delta function as being the limit point of a series of functions that put less and less mass on all points other than zero.

If someone could please demonstrate what this is (mathematically), then I'd greatly appreciate it.

If we obtain an infinitely narrow and infinitely high peak, as shown in this Wikipedia article, then wouldn't the integral over that peak be equal to $0$, since the peak is infinitely narrow and therefore has no area underneath it? This is my understanding from my study of the Riemann integral.

Thanks for any clarification.

Best Answer

The idea is that the "dirac delta function" at $0,$ denoted by $\delta_0,$ is not an ordinary function. But somehow we can still integrate with it. For every continuous $g$ we have

$$\int_{\mathbb R} g\,\delta_0 = g(0).$$

However, no integrable function could have this property. That is, if $f$ is integrable on $[-a,a],$ then

$$\int_{-a}^a g(x) f(x)\,dx = g(0)$$

will fail for some continuous $g.$

On the other hand, and staying with elementary means, we can find a sequence $f_n$ of continuous functions on $\mathbb R$ such that

$$\lim_{n\to \infty}\int_{\mathbb R} g(x) f_n(x)\,dx = g(0)$$

for every continuous $g$ on $\mathbb R .$

Proof: Define $f_n$ on $[-1/n,1/n]$ to have the triangular graph that joins the points $(-1/n,0), (0,n), (1/n,0);$ define $f_n=0$ everywhere else. You can see that the $f_n$'s live in ever smaller intervals centered at $0,$ but nevertheless $\int_{\mathbb R} f_n = 1$ for every $n.$

Let $g$ be continuous. Then for each $n$ there exists $c_n\in [-1/n,1/n]$ such that $|g(c_n)-g(0)|$ is the maximum of $|g(x)-g(0)|$ on the interval $[-1/n,1/n].$ Thus

$$|\int_{\mathbb R} g(x) f_n(x)\,dx - g(0)| = |\int_{\mathbb R} [g(x)-g(0)] f_n(x)\,dx|$$ $$ \le |g(c_n)-g(0)| \int_{\mathbb R} f_n = |g(c_n)-g(0)|\cdot 1.$$

As $n\to \infty,$ $c_n\to 0,$ so the last expression $\to 0$ by the continuity of $g$ at $0.$

Best Answer

Related Solutions

[Math] Single random variable, multiple probability distributions

Gradient/Steepest Descent: Solving for a Step Size That Makes the Directional Derivative Vanish

Related Question