Sampling theory (signal processing): why is multiplication by the dirac comb used to represent sampling

dirac deltasampling-theory

Wikipedia (and other sources) make the following claim (from https://en.wikipedia.org/wiki/Dirac_comb):

enter image description here

The claim is that multiplying any function by a Dirac comb transforms it into a train of impulses with integrals equal to the value of the function at the nodes of the comb: $(\operatorname{\text{ะจ}}_T x)(t) = \sum_{k=-\infty}^{\infty} x(t)\delta(t – kT) = \sum_{k=-\infty}^{\infty} x(kT)\delta(t – kT)$

It's obvious that this representation works for $t\ne kT$ since $\delta(t-kT)=0$. But when $t=kT$, why is $x(t)=x(kT)$? That is, why is $x(t)\delta(0)$ not undefined, or equal to $cx(t)$ for some constant $c$?

A lot of sampling theory is derived using this representation, so why is it correct? If it isn't strictly correct, does this mean much of sampling theory is based on a flawed representation?

There are lots of related questions here, but I've been unable to find a direct treatment of this topic.

Best Answer

The Dirac "function" $\delta(x)$ is not actually a function, but is a distribution instead. It's defining property is that for any smooth compactly supported function $\varphi$ : $$\int\varphi(x) \delta(x) \text{d}x = \varphi(0)$$

Now, if $f$ is a smooth function, you have : $$\int \varphi(x) f(x) \delta(x) \text{d}x = \varphi(0)f(0) = \int\varphi(x) f(0)\delta(x) \text{d}x $$

and therefore : $$f(x) \delta(x) = f(0) \delta(x)$$