[Math] Change of variables for a Dirac delta function

dirac deltadistribution-theoryintegrationnotation

I have often seen the following equality in Physics textbooks.
$$\int_{\mathbb{R}}\delta\left(\alpha x\right)f\left(\alpha x\right)|\alpha|dx=\int_{\mathbb{R}}\delta(u)f(u)du$$ or $$\int_{-\infty}^\infty \delta(\alpha x)\,dx =\int_{-\infty}^\infty \delta(u)\,\frac{du}{|\alpha|} =\frac{1}{|\alpha|} .$$
What does this equality mean mathematically, and how can this be made rigorous in the context of Dirac mass? Does this have any bearing on the change of variables in an integration where a function is being integrated with respect to a Dirac mass? for example, in the following integral

$$\int_{\mathbb{R}}f(\alpha x)d\delta_{x_0}(x),$$ does a scaling factor of $|\alpha|$ come out somehow?

Thank you.

Best Answer

To understand this properly, I suggest to look at distributions and how operations with distributions are defined in the first place.

A distribution is an object that acts on the space of infinitely differentiable and compactly supported functions in a linear and continuous way (check a textbook or Wikipedia for the precise definition). I.e. a distribution $T$ on a set $\Omega\subset\newcommand{\RR}{\mathbb{R}}\RR^n$assigns to any infinitely differentiable and compactly supported function $\phi$ defined on $\Omega$ a complex number $T(\phi)$. Then one notes that locally integrable function $f$ defined on $\Omega$ induced a distribution $T_f$ via the operation $$T_f(\phi) = \int_\Omega f(x)\phi(x)dx.$$ Now one can try to define operations which one can do to a locally integrable function also for a distribution by analogy. Take, for example, translation (in the case $\Omega=\RR^n$): define the operation $t_y(f)$ defined by $t_y(f)(x) = f(x-y)$. Observe that $$T_{t_y(f)}(\phi) = \int t_y(f)(x)\phi(x)dx = \int f(x-y)\phi(x)dx = \int f(x)\phi(x+y)dx = T_f(t_{-y}(\phi)).$$ I.e. "translating the function $f$ is the same as translation the test function $\phi$ in the opposite direction". Hence, one defines the translation of the distribution $T$ as $$t_y(T)(\phi) := T(t_{-y}\phi).$$ (Work out, that the translation of the Dirac $\delta$ is what you think it should be.) Now you should be able to do the same thing with scaling $s_a(f)(x) = f(ax)$.