Probability Theory – Understanding the Law of Total Variance

gamma distributionpoisson distributionprobabilityrvariance

I trying to experiment with law of total variance in order to empirically recreate theoretical results.
In particular I am interested in verifying that:
$$
Var(Y) = E[Var(Y|X)] + Var(E[Y|X])
$$

Let's assume I have the following random variables:
$$
X \sim Gamma(\alpha, \beta)
$$

$$
Y \sim Poisson(X)
$$

The total variance of Y should be equal to:
$$
Var(Y) = E[Var(Y|X)] + Var(E[Y|X]) = E[X] + Var(X) = \alpha*\beta + \alpha*\beta^2
$$

This follow from $E[X] = \alpha*\beta$, $Var(X) = \alpha*\beta^2$, $E[Y|X] = Var(Y|X) = X$, which are known results for the Gamma and Poisson distribution.
I tried to recreate such results in R with the following code:

alpha <- 250
beta <- 10

n <- 10000

X <-  rgamma(n, shape = alpha, scale = beta)

Y <- replicate(n, rpois(1, sample(X, 1)))

alpha * beta + alpha * beta^2

[1] 27500

mean(X) + var(X)

[1] 27552.47

var(Y)

[1] 26610.7

In this situation the total variance is 27500 and when I calculate it manually with the previous results it is, in fact, 27500 (or close enough).
My question is, why when I do var(Y) I get a wrong result? Am I defining the variable Y in the right way?

Thank you

Best Answer

Your estimator is quite close to the true variance, but it appears you might need a higher sample size to get as close as you want. (You also forgot to set the seed to get a reproducible analysis.) I'm going to try it with a larger number of samples:

#Set parameters
alpha <- 250
beta <- 10

#Set sample size
n <- 10^7

#Generate random variables
set.seed(1)
X <- rep(0, n)
Y <- rep(0, n)
for (i in 1:n) {
  X[i] <- rgamma(1, shape = alpha, scale = beta)
  Y[i] <-  rpois(1, lambda = X[i]) }

#Check sample variance
var(Y)
[1] 27522.87
Related Question