Calculating $\mathbb{E}[N]$ for $N = \displaystyle \min_{n\in \mathbb{N}}\Big\{\sum_{i=1}^{n}{X_i\geq5000\Big\}}$, using Wald’s lemma

expected valueprobabilitystopping-times

Suppose I have a sequence of $i.i.d$ random variables: $X_1,X_2,… \sim Geom(p)$.
That means that each of the $X_i$'s is holding an unknown random number of trials until a 'success'.

Since $0<p<1$, we know that $X_i$ has a finite expectation $\mathbb{E}[X_i] = \frac{1}{p}$, which also means that there exists an integer $N \in \mathbb{N}$ such that:
$$N = \displaystyle \min_{n\in \mathbb{N}}\Big\{X_1+X_2+…+X_n= \sum_{i=1}^{n}{X_i\geq5000}\Big\}$$

We would like to calculate the expectation of this finite integer 'stopping time' $N$. From Wald's lemma:

If $X_i$ are i.i.d. with finite $\mathbb{E}[X_i] = \mu$, and N is a finite stopping time then: $\mathbb{E}\Big[\sum_{i=1}^{N}{X_i}\Big] = \mu\mathbb{E}[N]$.

My problem is how to deal with the 'greater-equal' ($\geq$) sign. Since we define $N = \displaystyle \min_{n\in \mathbb{N}}\Big\{\sum_{i=1}^{n}{X_i\geq5000\Big\}}$, this means that $X_N$ contibutes a number of trials with which the sum exceeds $5000$, but we dont know the exact sum.

My intuition is something like: if we take the the sum as the bare minimum, then $$\mathbb{E}\Big[\sum_{i=1}^{N}{X_i}\Big] = 5000= \mathbb{E}[N]\times \frac{1}{p} \to \mathbb{E}[N] = 5000p$$
But even if that's the case I'm having trouble justifying taking the sum as exactly 5000.

Another possible approach is to condition on $\sum_{i=1}^{N}{X_i}=k$ and take the expectation, but $k = 5000, 5001,…$ and I'm not sure how to formulate this, since $k$ is potentially unbounded ($k\in [5000,\infty)$), if that's even a valid approach.

I'd love some guidance please.

Best Answer

I reached the same conclusion as Henry (+1) in a more roundabout manner. What follows is the gist of my thoughts:

Recall that a geometric random variable models the number of independent Bernoulli trials until a success occurs. Hence to every realization of a geometric random variable we can canonically associate a realization of a string of Bernoulli random variables. For example, let $X_1,\dots, X_{20}$ be a sequence of geometric random variables and $\omega$ be such that the $(X_1(\omega),\dots,X_{20}(\omega))$ is equal to $(6, 3, 2, 12, 1, 17, 1, 1, 1, 1, 2, 2, 3, 6, 1, 2, 1, 4, 5, 3)$. To this realization we can associate the following realization of a string of Bernoulli random variables:

$00000100101000000000001100000000000000001111101010010000011011000100001001$

Where $1$ denotes success. Note that the number of $1$'s in this string is equal to the number of geometric random variables. We are interested in the minimal number of geometric random variables (the minimal number of $1$'s) such that the string up to and including the last $1$ has length $\geq 5000$. Consider the first $5000$ entries in the string. Clearly, the number of $1$'s in the first $5000$ entries (or trials) can be written as a binomial random variable.

Let $Y$ denote the number of $1$'s in the first $5000$ entries of the random string that is obtained from the geometric random variables as described above. $Y$ is a random variable and we have $Y\sim \text{Bin}(5000,p)$. Denote by $Y_{5000}$ the outcome of the $5000$'th Bernoulli trial.

Claim. We have the following equality

$$N=Y \mathbf{1}({Y_{5000}=1})+(Y+1) \mathbf{1}(Y_{5000}=0), \qquad (1)$$

where $\mathbf{1}()$ is an indicator function. In particular, $$\mathsf E(N) = 4999p+1.$$

Proof. The reasoning is as follows. Consider the first $5000$ trials. Either we have a success on the $5000$'th trial, in which case we have that exactly $Y$ geometric RV have been added up to reach $5000$, and hence $N=Y$; or we do not have a success on the 5000'th trial. In this case we have to wait until the next success occurs for the current geometric RV, which will add $1$ to the tally of $Y$. So in this case we must have $N=Y+1$.

Rewriting $(1)$ as

$$N=Y+\mathbf{1}(Y_{5000}=0).$$

And taking expectations gives the result.
$\square$

Related Question