Expected Value – Expectation of the Product of IID Random Variables

expected valuemomentsrandom variablesimulation

If we have iid random variables $X_1,X_2,…,X_N$ with $\mathbb{E}X_i=\mu$, is it true that $\mathbb{E}\prod X_i=\mu^N$?

I had no doubt that this is true, until I tried it out with Python, using random.normalvariate() to generate the set of samples, and surprisingly found that the product of all these data points are generally a lot smaller than $\mu^N$.

For example, I used that function to generate 2 million (there's absolutely no need to have a dataset of this size, but I went for it anyways) data points that are, supposedly, distributed as $N(1, 0.2)$. I was hoping for their product to scatter around 1 as I repeat the trial, but instead I got numbers $\sim\pm10^{-18500}$ constantly.

For what it's worth, I've tried sample sizes ranging from 1 to 1000, and they all fell below their respective $\mu^N$, significantly — the difference was visible on a log-scaled plot.

I suspected random.normalvariate() generated something whose PDF is not $N(1,0.2)$. But I plotted that 2 million data points and got a perfect bell-shape curve.

I suspected that the data are correlated among themselves. But I tried to compute $\mathbb{E}\prod X_i$ with $\text{corr}(X_i,X_j)=\rho$ and found that my calculation could not explain it. I'm not hundred percent confident in my calculation though.

And I tried to understand it intuitively, and had the following thought. Say we have a lot of $N(1,0.2)$ data points. It is conceivable that they lie symmetrically around 1. So, we can group the data points into pairs that are roughly like $\{(1-a_n),(1+a_n)\}$. This should be feasible when the sample size is large enough. But each pair has their product being less than 1. Therefore, the total product is a lot smaller than 1.

So, this seems to me a paradox. I can't dissuade myself from $\mathbb{E}\prod X_i=\mu^N$, but neither can I find the loophole in the thought above (or my empirical tests). I feel that I must have made a blatant mistake somewhere, but I can't locate that error. Please help me out if you know the answer!

Best Answer

First, let's establish the correct identity.

When $X_1, \ldots, X_N$ are independent variables with finite expectations $\mu_i=X_i,$ then by laws of conditional expectation,

$$E\left[\prod_{i=1}^N X_i\right] =E\left[X_N E\left[\prod_{i=1}^{N-1} X_i \mid X_{N}\right]\right] = E\left[X_N \prod_{i=1}^{N-1} \mu_i\right] = \mu_N\prod_{i=1}^{N-1} \mu_i= \prod_{i=1}^N \mu_i$$

gives a proof by mathematical induction (beginning with the base case $N=1$ where $$E\left[\prod_{i=1}^N X_i\right] = E\left[X_1\right] = \mu_1 = \prod_{i=1}^N \mu_i$$ is trivially true).

Now, let's find an explanation for the simulation results.

The pairing argument in the question is an interesting one, because it shows that when multiplications by $(1-a)$ and $(1+a)$ occur in equal numbers (approximately $N/2$ each), the net product is $(1-a^2)^{N/2}\approx \exp(-Na^2/2).$ This suggests that when $N$ is sufficiently large, it's nearly certain that the product will be tiny--certainly less than the common mean of $1.$ The reason this is not a paradox is that there will be a vanishingly small--but still positive--probability of yielding a whopping big number on the order of $(1+a)^N \approx \exp(aN).$ This rare chance of a huge product balances out all the tiny products, keeping the mean at $1.$

It is not easy to analyze the product of many Normal variables. Instead, we may gain insight from a simpler case. Let $Y_1, Y_2, \ldots,$ be a sequence of independent Rademacher variables: that is, each of these has a $1/2$ chance of being either $1$ or $-1.$ Pick some number $0 \lt a \lt 1$ and define $X_i = 1 + aY_i,$ so that each $X_i$ has equal chances of being $1\pm a.$ Clearly $E[X_i] = 1 = \mu_i$ for all $i.$

Consider the product of the first $N$ of these $X_i.$ Suppose, in a simulation, that $k$ of these values equal $1-a$ and (therefore) the remaining $N-k$ of them equal $1+a.$ The product then is $(1-a)^k(1+a)^{N-k}.$ How small must $k$ be for this product to exceed $1$?

Given $N$ and $a,$ we must solve the inequality

$$(1-a)^k(1+a)^{N-k} \ge 1$$

for $k.$ By taking logarithms, this is equivalent to

$$k \le N \frac{\log(1+a)}{\log(1+a) - \log(1-a)}.$$

Because each $X_i$ has equal and independent chances of being $1\pm a,$ the distribution of $k$ is Binomial$(N, 1/2),$ which even for moderate sizes of $N$ ($N \ge 10$ is fine) is nicely approximated by a Normal$(N/2, \sqrt{N}/2)$ distribution. Thus, the chance that the product is $1$ or greater will be close to the value of the standard Normal distribution at $Z$ (the tail area under the Bell Curve left of $Z$) where

$$Z = \frac{N \frac{\log(1+a)}{\log(1+a) - \log(1-a)} - \frac{N}{2}}{\sqrt{N}/2} = \text{constant}\times \sqrt{N}.$$

You can see where this is going! As $N$ grows large, $Z$ is pushed further out to the left, making it less and less likely to observe any product greater than $1$ in a simulation.

In the question, where the standard deviation is $0.2,$ the value $a=0.2$ will closely reproduce the simulation behavior. In this case the constant is

$$\text{constant} = \frac{2\log(1+a)}{\log(1+a)-\log(1-a)} - 1 = -0.100\ldots$$

Taking $N=2\times 10^6,$ for instance, as in the question, compute $Z \approx -142.$ The chance that $k$ is small enough to produce a value this negative is less than $10^{-10000}.$ You can't even represent that in double precision floats. It would take far more than the age of the universe to create a simulation that had the remotest chance of producing such an imbalance between the $1+a$ and $1-a$ values that the product exceeds $1.$

In short, for all practical purposes, when $N$ is sufficiently large ($N \gg 5000$ will do when $a=1/5$), you will never observe a value above $1$ in this simulation, even though the mean of the product is $1.$

Related Question