Why are Probability Generating Functions important

generating-functionsprobability

I am trying to learn more about Probability Generating Functions. Here is my basic understanding:

For a discrete random variable $X$, the probability generating function $G_X(s)$:

$$ G_X(s) = E(s^X) = \sum_{x=0}^{\infty} s^x P(X = x) $$

  • $E(s^X)$ is the expected value of $s^X$
  • $P(X = x)$ is the probability that the random variable $X$ equals $x$

It seems to me that the Probability Generator Function is similar to a Moment Generating Function, just that one is for the discrete case (PGF) and one is the for the continuous case (MGF). The derivatives of the PGF can be used to describe the mean and variance of the original random variable:

$$ G'_X(1) = E[X] $$
$$ G''_X(1) = E[X(X-1)] $$
$$ P(X=k) = \frac{1}{k!} \frac{d^k G_X(s)}{ds^k} \Bigg|_{s=0} $$

This leads me to my question: Why are Probability Generator Functions useful in the real world? In what kinds of problems/applications can PGF's be helpful? Why would life be difficult without PGF's?

Doing some reading into this, I came across some materials where they indicate that PGF's are useful for finding out the properties of the distributions for functions of random variables. I tried to create an example to test my understanding:

Suppose we have:

  • X1: A regular six-sided die (Die 1), where each face {1, 2, 3, 4, 5, 6} has an equal probability of 1/6.
  • X2: An irregular six-sided die (Die 2), where the faces have different probabilities: Probabilities {1/12, 1/12, 1/12, 1/12, 1/6, 1/2} for faces {1, 2, 3, 4, 5, 6} .

The (PGF) for each of these die is given by:

$$ G_X(s) = \sum_{x=1}^{6} s^x P(X = x) $$

$$ G_{X1}(s) = \frac{1}{6}s + \frac{1}{6}s^2 + \frac{1}{6}s^3 + \frac{1}{6}s^4 + \frac{1}{6}s^5 + \frac{1}{6}s^6 $$

$$ G_{X2}(s) = \frac{1}{12}s + \frac{1}{12}s^2 + \frac{1}{12}s^3 + \frac{1}{12}s^4 + \frac{1}{6}s^5 + \frac{1}{2}s^6 $$

If we want to find the distribution of the sum of the two dice (assuming that there is no correlation between them and both are independently and identically distributed iid ), we can use the fact that the PGF of the sum of two independent random variables is the product of their individual PGFs. Without PGF's, it seems like this problem would be much harder and involve manually enumerating all outcomes and their probabilities.

If I understand this correctly, if we define $Y = X1 + X2$, then:

$$ G_Y(s) = G_{X1}(s) * G_{X2}(s) $$
$$ G_Y(s) = \left(\frac{1}{6}s + \frac{1}{6}s^2 + \frac{1}{6}s^3 + \frac{1}{6}s^4 + \frac{1}{6}s^5 + \frac{1}{6}s^6\right) * \left(\frac{1}{12}s + \frac{1}{12}s^2 + \frac{1}{12}s^3 + \frac{1}{12}s^4 + \frac{1}{6}s^5 + \frac{1}{2}s^6\right) $$

Suppose I want to find out the probability of rolling a 3. As I understand, if I expand this multiplication and take the coefficient with the 3rd power – this should correspond to the probability of rolling a 3.

I used symbolic multiplication in R to do this:

    # https://stackoverflow.com/questions/39979884/calculate-the-product-of-two-polynomial-in-r
    library(polynom)
    
    GX1 <- polynomial(c(0, 1/6, 1/6, 1/6, 1/6, 1/6, 1/6))
    GX2 <- polynomial(c(0, 1/12, 1/12, 1/12, 1/12, 1/6, 1/2))
    
    result <- GX1 * GX2
    
    print(result)
0.01388889*x^2 + 0.02777778*x^3 + 0.04166667*x^4 + 0.05555556*x^5 + 0.08333333*x^6 + 0.1666667*x^7 + 0.1527778*x^8 + 0.1388889*x^9 + 0.125*x^10 +  
0.1111111*x^11 + 0.08333333*x^12 

Therefore, it seems like there is a 0.027 probability of rolling a 3 in this situation.

Did I get the right idea? Is this the main application of Probability Generator Functions?

  • PS: I tried to confirm this result using a simulation in R

It looks like I got the same number:

dice1 <- c(1, 2, 3, 4, 5, 6)
dice2 <- c(1, 2, 3, 4, 5, 6)

# Define the probabilities
prob1 <- rep(1/6, 6) # equal probabilities for dice1
prob2 <- c(1/12, 1/12, 1/12, 1/12, 1/6, 1/2) # different probabilities for dice2

sums <- numeric(1000000)

for (i in 1:1000000) {
    roll1 <- sample(dice1, size = 1, prob = prob1)
    roll2 <- sample(dice2, size = 1, prob = prob2)
    sums[i] <- roll1 + roll2
}

relative_percent <- table(sums) / length(sums)

result_df <- data.frame('Sum' = as.numeric(names(relative_percent)), 
                        'Relative_Percent' = as.numeric(relative_percent))

print(result_df, row.names = FALSE)


    


 Sum Relative_Percent
   2         0.013672
   3         0.027447
   4         0.041687
   5         0.055431
   6         0.083465
   7         0.166814
   8         0.152676
   9         0.137949
  10         0.125130
  11         0.111553
  12         0.084176

Best Answer

As you noticed, the characteristic, moment generating and probability generating functions are related to each others, in that they represent different faces of the same coin. Indeed, they correspond respectively the Fourier, Laplace and Mellin transforms of the probability mass/density function, which are themselves related through changes of variable.

As always in Mathematics, they constitute available tools, which you may take advantage of, whenever the context requires them. It is to be highlighted that they are also unique to each distribution, in such a way they serve as a criterion allowing us to distinguish/recognize the distributions behind a random variable.

Nonetheless, their main purpose is very down-to-earth in the end : they permit to simplify some computations. For example, you have already computed moments of a random variable most certainly, the computations are more or less lengthy when dealing with mean or variance, but they can complexify for higher moments. That is why the probability generating function provides a quicker way to compute them through differentiation, because all the moments are contained and calculated once for all inside the said function, in a way.

Another well-known example is given by linear combinations of random variables. In that case, the probability mass/density function will be the ($n$-fold) convolution product of the individual probability functions, which may turn out to be quite hard to handle. However, thanks to the convolution theorems associated to Fourier/Laplace/Mellin transforms, the linear combination is described simply by the usual product of their characteristic / moment generating / probability generating function respectively. For example, the probability generating function of $X = \lambda_1X_1 + \ldots + \lambda_nX_n$, with $X_1,\ldots,X_n$ independent, is given by $G_X(z) = G_{X_1}\left(z^{\lambda_1}\right) \cdots G_{X_n}\left(z^{\lambda_n}\right)$. When the number of random variables is itself a random variable $N$ (cf. compound processes) and the $X_i \sim X$ are iid, we have even : $G_{X_1+\ldots+X_N}(z) = G_N(G_X(z))$.

Finally, it is to be noted that some distributions involve so nasty expressions that they are even devoid of a closed-form probability mass/density function; in consequence, the characteristic/generating function is the only way to describe them.

Related Question