Moment-Generating Functions – Understanding the Role of “t” in Generating Functions

characteristic functionmoment-generating-functionmomentsprobabilityrandom variable

I am studying generating functions applied to probability (moment generating functions, probability generating functions and characteristic functions). I perfectly see their purposes and usefulnesses, but I fail to grasp the underlying intuition behind the definitions. Is there any way to derive the functions from anywhere? I see a certain analogy between mgf and Laplace transform and cf and Fourier transform. What does the index t stand for?

Edit:

I will rephrase the question. As Neil G kindly pointed out, the Wikipedia page suggests that the moment generating function is the double sided Laplace transform of the probability density function of a continuous random variable. Concentrating on the mgf, it would be:

$M_{x}(t)=E(e^{tX})$

$t\in \mathbb{R}$

Now, to my knowledge, the Laplace transform can be seen as the continuous analogue of a power series. How does the Laplace transform provide any connection between a continuous probability density function and its moments? I can see that taking the derivative of the function and evaluating at $t=0$ gives the moment (if the integral is absolutely convergent), but why?

Best Answer

In a sense, an MGF is simply a way of encoding a set of moments into a convenient function in a way that you can do some useful things with the function.

The variable $t$ in no way relates to the random variable $X$. You could as readily write $M_X(s)$ or $M_X(u)$... it is, in essence a kind of dummy variable. It doesn't stand for anything beyond being the argument of the mgf.

Herbert Wilf [1] calls a generating function:

a clothesline on which we hang up a sequence of numbers for display

It really wouldn't matter which exact clothesline you hung them on; another would do just as well.

Is there any way to derive the functions from anywhere?

There's more than one way to turn a set of moments into a generating function (e.g. a discrete distribution has a probability generating function, a moment generating function, a cumulant generating function and a characteristic function and you can recover the moments (in some cases less directly than others) from any of them.

So there's not a unique way to encode a set of moments into a function; it's a matter of choice about how you set it up. While they're similar (and, naturally, related), some are more convenient for particular kinds of tasks.

I see a certain analogy between mgf and Laplace transform and cf and Fourier transform.

Not merely an analogy, at least if we consider the bilateral Laplace transform (which I'll still denote as $\mathcal{L}$ here). We see $M_X(t) = \mathcal{L}_X(-t)$ is (at least up to a change of sign) really a Laplace transform (indeed, consider $\mathcal{L}_X(-t) =\mathcal{L}_{-X}(t)$, so it's the bilateral Laplace transform of a flipped variate). One can convert readily from one to the other, and use results for Laplace transforms on mgfs quite happily (and, for that matter, tables of Laplace transforms, if we keep that sign issue in mind). Similarly, characteristic functions are not merely analogous to Fourier transforms, they are Fourier transforms (again, up to the sign of the argument which is of no consequence outside the obvious effect swapping the sign of the argument has on a function).

If Fourier transforms and Laplace transforms help give you intuition about what mgfs and cfs "are" you should certainly exploit those intuitions, but on the other hand, it's not always necessary to have intuition when manipulating these things.

In fact when playing with cfs, because they always exist and are unique, I often tend to think of them as just the distribution looked at through a different lens.

I can see that taking the derivative of the function and evaluating at t=0 gives the moment (if the integral is absolutely convergent), but why?

Because the particular generating function we chose to use (the mgf) was set up to work that way. In order to be able to extract the set of moments from the function again you need something like that -- a way to eliminate all the lower ones (such as differentiation) and eliminate all the higher ones (such as set the argument to 0) so that you can pick out the exact one you need. For that to happen you already need something that works kind of like an mgf. At the same time, it's nice if it has some other properties you can exploit (as the various generating functions we use with random variables do), so that restricts our set of choices even further.

[1] Wilf, H. (1994)
generatingfunctionology, 2nd ed
Academic Press Inc., San Diego
https://www.math.upenn.edu/~wilf/DownldGF.html