Characteristic Function as a Fourier Transform

characteristic-functionsfourier analysisprobabilityprobability theory

The fourier transform of a function is defined to be:

$$\hat{f}(\omega)=\int_{R}e^{-it\omega}f(t)dt$$

which I understand that essentially $e^{-it\omega}$ controls the frequency at which our function $f(t)$ is wrapped around the unit circle in the complex plane, with $f(t)$ dictating the radius of the polar graph for a given $t$. That is, if my frequency is $10$ then every rotation around the unit circle traverses $\frac{1}{10}$ seconds of my graph $f(t)$ in 10 rotations I have covered 10 seconds of my function $f(t)$. The fourier transform outputs the central mass for a given frequency across all $t$ of our polar graph.
For wave functions this intuitively makes sense but what does the fourier transform for our probability density function tell us? How do I interpret frequency with respect to a non-wave function?

Best Answer

Indeed, the fact that this is a Fourier transform is by and large a mathematical coincidence; the intuition comes not from interpreting it as a Fourier transform, but by considering it from another angle, that of moment generating functions.

Throughout this answer, I assume all random variables are real-valued; it seems like that's what you're concerned about anyway.

If you have done some statistics, you are almost certainly familiar with the concept of the moment generating function of $X$, $$ M_X : \mathbb R \to \mathbb R \\ M_X(t) = \mathbb E\big[e^{tX}\big]. $$ This function has many nice properties. For instance, the $n$-th moment of $X$, $\mathbb E\big[X^n\big]$, can be found by computing $M_X^{(n)}(0)$, the $n$-th derivative of $M_X$ evaluated at $0$. Another important application is the fact that two random variables with the same moment generating function have the same distribution; that is to say, the process of determining a moment generating function is "invertible". A third and also significant application is the fact that, for any two independent random variables $X$ and $Y$, we have \begin{align*} M_{X+Y}(t) &= \mathbb E \big[e^{t(X+Y)}\big] \\ &= \mathbb E \big[e^{tX} e^{tY}\big] \\ &= \mathbb E \big[e^{tX} \big] \mathbb E \big[e^{tY} \big] \\ &= M_X(t)M_Y(t). \end{align*} (In a somewhat informal sense the third equality follows by considering $e^{tX}$ and $e^{tY}$ as independent random variables.) In conjunction with the fact that moment generating functions are invertible, this essentially permits us to derive a formula for the distribution of the sum of two independent random variables; hopefully, this application also makes clear why there is a seemingly arbitrary exponential in the definition of the moment generating function.

Now, the classical example of an application of moment generating functions is in the proof of the Central Limit Theorem. They are a natural candidate, because CLT involves the sums of independent random variables, and moment generating functions are well-equipped to deal with such matters. However, there is a glaring issue with their use: moment generating functions do not always exist. In particular, a random variable with infinite mean will not have a convergent moment generating function for any $t$ other than $0$.

This is where characteristic functions come in. As you know, we define a characteristic function by $$ \varphi_X : \mathbb R \to \mathbb C \\ \varphi_X(t) = \mathbb E \big[ e^{itX} \big]. $$ All of the nice properties that applied for moment generating functions mentioned above still apply for characteristic functions. In particular:

  • the $n$-th moment of $X$ can be found as $(-i)^{(n)} \varphi_X^{(n)}(0)$, if it exists

  • two random variables with the same characteristic function have the same distribution

  • $\varphi_{X+Y}(t) = \varphi_X(t)\varphi_Y(t)$ for independent r.v.s $X$, $Y$ (this is proven essentially the same way as before).

The critical difference with moment generating functions is this: characteristic functions always exist, at least of real-valued random variables. The intuitive reason that characteristic functions will always exist is that the possible values taken by $e^{itX}$ all lie on the unit circle, hence are bounded, and so intuitively the integral defining the expected value will take a finite value somewhere within the unit circle. Going back to the CLT example, this then allows us to complete our proof without issue; indeed, if you are interested, the proof on the Wikipedia page uses characteristic functions.

Based on this little narrative, it is pretty clear that the entire motivation for the introduction of $i$ in the exponent of the characteristic function is the fact that convergence will be guaranteed for a real-valued random variable. It is not much more than a nice mathematical coincidence that the characteristic function coincides with the Fourier transform, and it makes little sense (at least in my opinion) to try and carry over intuitions from the Fourier transform to the characteristic function; instead, the intuition can be seen by thinking about how this function might have been discovered in the first place.