[Math] Intuition on Wald’s equation without using the optional stopping theorem.

intuitionprobability theory

The Wald's equation even at its simplest form as stated below simplifies many problems of calculating expectation.

Wald's Equation: Let $(X_n)_{n\in\mathbb{N}}$ be a sequence of real-valued, independent and identically distributed random variables and let $N$ be a nonnegative integer-value random variable that is independent of
the sequence $(X_n)_{n\in\mathbb{N}}$. Suppose that $N$ and the $X_n$ have finite expectations. Then
$$
\operatorname{E}[X_1+\dots+X_N]=\operatorname{E}[N] \cdot\operatorname{E}[X_n]\quad \forall n\in\mathbb{N}\,.
$$

I am looking for an intuitive explanation of Wald's equation without using the optional stopping theorem.
I'm not interested in explanations for the error
$
\operatorname{E}[X_1+\dots+X_N]=N\cdot \operatorname{E}[X_1]
$
or explanations to discuss only the hypotheses.

We could have, for exemple, a function $\varphi$ of two or more variables such that $\operatorname{E}[X_1+\dots+X_N]=\varphi\big(\operatorname{E}[N]\, ,\,\operatorname{E}[X_n]\big), \quad\forall n\in\mathbb{N}\,$.
The question then becomes for what reason $\varphi(x,y)$ equals $x\cdot y$?

More generally we could have two linear functional $F : L^1(\Omega,\mathcal{A},P)\to \mathbb{R}$ and $G : L^1(\Omega,\mathcal{A},P)\to \mathbb{R}$ such that $\operatorname{E}[X_1+\dots+X_N]=\varphi\big(\operatorname{F}[N]\, ,\,\operatorname{G}[X_n]\big), \quad\forall n\in\mathbb{N}\,.$ So the question would be for what reason $\operatorname{F}=\operatorname{G}=\operatorname{E}$ and $\varphi(x,y)$ equals $x\cdot y$?

The interest is on the intuition of the equation. An answer based on a good example will be very welcome.

Thanks in advance.

Best Answer

One simple intuitive explanation is that $$ \mathbb{E}[X_1+\cdots+X_N | N=n] = n\mathbb{E}[X_1], $$ so it follows that $$ \mathbb{E}[X_1+\cdots+X_N] = \mathbb{E}[\mathbb{E}[X_1+\cdots+X_N|N]] = \mathbb{E}[N \mathbb{E}[X_1]] = \mathbb{E}[N] \mathbb{E}[X_1]. $$ This works because they are independent, so you just take $N$ copies of the same r.v. The identity is not quite as trivial when $N$ is a stopping time.

Related Question