Solved – When to prefer the moment generating function to the characteristic function

characteristic functionmoment-generating-function

Let $(\Omega, \mathcal{F}, P)$ be a probability space, and let $X : \Omega \to \mathbb{R}^n$ be a random vector. Let $P_X = X_* P$ be the distribution of $X$, a Borel measure on $\mathbb{R}^n$.

  • The characteristic function of $X$ is the function
    $$
    \varphi_X(t) = E[e^{i t \cdot X}] = \int_\Omega e^{i t \cdot X} \, dP,
    $$

    defined for $t \in \mathbb{R}^n$ (the random variable $e^{i t \cdot X}$ is bounded hence in $L^1(P)$ for all $t$). This is the Fourier transform of $P_X$.
  • The moment generating function (m.g.f.) of $X$ is the function
    $$
    M_X(t)
    = E[e^{t \cdot X}] = \int_\Omega e^{t \cdot X} \, dP,
    $$

    defined for all $t \in \mathbb{R}^n$ for which the integral above exists. This is the Laplace transform of $P_X$.

Already, we can see that the characteristic function is defined everywhere on $\mathbb{R}^n$, but the m.g.f. has a domain that depends on $X$, and this domain might be just $\{0\}$ (this happens, for example, for a Cauchy-distributed random variable).

Despite this, characteristic functions and m.g.f.'s share many properties, for example:

  1. If $X_1, \ldots, X_n$ are independent, then
    $$
    \varphi_{X_1 + \cdots + X_n}(t) = \varphi_{X_1}(t) \cdots \varphi_{X_n}(t)
    $$

    for all $t$, and
    $$
    M_{X_1 + \cdots + X_n}(t) = M_{X_1}(t) \cdots M_{X_n}(t)
    $$

    for all $t$ for which the m.g.f.'s exist.
  2. Two random vectors $X$ and $Y$ have the same distribution if and only if $\varphi_X(t) = \varphi_Y(t)$ for all $t$. The m.g.f. analog of this result is that if $M_X(t) = M_Y(t)$ for all $t$ in some neighborhood of $0$, then $X$ and $Y$ have the same distribution.
  3. Characteristic functions and m.g.f.'s of common distributions often have similar forms. For example, if $X \sim N_n(\mu, \Sigma)$ ($n$-dimensional normal with mean $\mu$ and covariance matrix $\Sigma$), then
    $$
    \varphi_X(t) = \exp\left(i \mu\cdot t – \frac{1}{2} t \cdot (\Sigma t)\right)
    $$

    and
    $$
    M_X(t) = \exp\left(\mu\cdot t – \frac{1}{2} t \cdot (\Sigma t)\right).
    $$
  4. When some mild assumptions hold, both the characteristic function and the m.g.f. can be differentiated to compute moments.
  5. Lévy's continuity theorem gives a criterion for determining when a sequence of random variables converges in distribution to another random variable using the convergence of the corresponding characteristic functions. There is a corresponding theorem for m.g.f.'s (Curtiss 1942, Theorem 3).

Given that characteristic functions and m.g.f.'s are often used for the same purpose and the fact that a characteristic function always exists whereas a m.g.f. doesn't always exist, it seems to me that one should often prefer to work with characteristic functions over m.g.f.'s.

Questions.

  1. What are some examples where m.g.f.'s are more useful than characteristic functions?
  2. What can one do with an m.g.f. that one cannot do with a characteristic function?

Best Answer

That's a good question, but a broad one, so I can't promise I'll say everything about it that should be said. The short answer is that rival techniques differ not in what they can do, but in how neatly they can do it.

Characteristic functions require extra caution because of the role of complex numbers. It's not even that the student needs to know about complex numbers; it's that the calculus involved has subtle pitfalls. For example, I can get a Normal distribution's MGF just by completing the square in a variable-shifting substitution, but a lot of sources carelessly pretend the approach using characteristic functions is just as easy. It isn't, because the famous normalisation of the Gaussian integral says nothing about integration on $ic+\mathbb{R}$ with $c\in\mathbb{R}\backslash\{ 0\}$. Oh, we can still evaluate the integral if we're careful with contours, and in fact there's an even easier approach, in which we show by integrating by parts that an $N(0,\,1)$ distribution's characteristic function $\phi (t)$ satisfies $\dot{\phi}=-t\phi$. But the MGF approach is even simpler, and most of the distributions students need early on have a convergent MGF on either a line segment (e.g. Laplace) or half-line (e.g. Gamma, geometric, negative binomial), or the whole of $\mathbb{R}$ (e.g. Beta, binomial, Poisson, Normal). Either way, that's enough to study moments.

I don't think there's anything you can do only with the MGF, but you use what's easiest for the task at hand. Here's one for you: what's the easiest way to compute the moments of a Poisson distribution? I'd argue it's to use a different technique again, the probability-generating function $G(t)=\mathbb{E}t^X=\exp \lambda (t-1)$. Then the falling Pochhammer symbol $(X)_k$ gives $\mathbb{E}(X)_k=G^{(k)}(1)=\lambda^k$. In general it's usually worth using the PGF for discrete distributions, the MGF for continuous distributions that either are bounded or have superexponential decay in the PDF's tails, and the characteristic function when you really need it.

And depending on the question you're asking, you may instead find it prudent to use the cumulant generating function, be it defined as the logarithm of the MGF or CF. For example, I'll leave it as an exercise that the log-MGF definition of cumulants for the maximum of $n$ $\operatorname{Exp}(1)$ iids gives $\kappa_m=(m-1)!\sum_{k=1}^n k^{-m}$, which provides a much easier computation of the mean and variance (respectively $\kappa_1$ and $\kappa_2$) than if you'd written them in terms of moments.

Related Question