There is some ambiguity in your notation. It is not clear where the summation index $i$ should be. That is, are we dealing with $s_i(X)$ or $s(X_i)$. If it is the latter, then which of the $\{i\}$ would $t$ depend on?
If we are dealing simply with $E\left(\sum_{i=0}^N X_i\right)$ where $N$ is a random variable independent of $\{X_i\}$, then this is known as the Wald's Identity
The key idea is to use the law of total expectation as $$E\left(\sum_{i=0}^N X_i\right) = \sum_n P(N=n) \, E\left(\sum_{i=0}^n X_i\right) $$
EDIT March 30, 2017 (based on comments and revised original Q):
Let's breakdown some of the combinations of cases:
1) $N$ independent of $\{X_i\}$ with $\{X_i\}$ having common mean: This is the Wald's identity referenced above, i.e., the expectation of the random sum is $EN*EX_i$.
2) $N$ is not independent of $\{X_i\}$, but $N$ is a stopping time with $EN < \infty$, $\{X_i\}$ is an independent sequence with a common finite mean, $\{X_i\}$ either identical or uniformly bounded:
Wald's identity still holds in this case also. OP states it's not hard to show Wald's identity when $\{X_i\}$ are iid. A review of Blackwell's 1946 paper on Annals of Mathematical Statistics shows it is not trivial to show, even assuming iid.
$N$ being a stopping time means, loosely speaking, that the event $\{N=n\}$ does not depend on $\{X_{n+1}, X_{n+2}, \ldots\}$. That is $\{N=n\} \subset \sigma\left(\{X_{1}, X_{2}, \ldots, X_{n}\}\right)$
3) General case: The law of total expectation doesn't stop to exist in this scenario, although one might not be able to go much further without specifics about the dependency between $\{X_i\}$ and $N$ are not known. That is one can certainly write:
$$ E\left(\sum_{i}^{N=t(X)}s_i(X)\right) = \sum_{n=1}^{\infty} \left\{P(N=n) E\left(\sum_{i}^{n}s_i(X) \, \middle| \, N=n\right)\right\}$$
according to the law of total expectation, but one might need to deal with the right side on a case by case basis.
One can also re-write the above based on alternative forms for expectation of a discrete non-negative random variable (e.g., Intuition behind using complementary CDF to compute expectation for nonnegative random variables).
So then is $E[X^2]$ equal to $\sum_i (iP(X=i))^2$? Or is it $\sum_i i^2 P(X=i^2)$ or $\sum_i i^2 P(X=i)$?
It is indeed the last one, as you suspect.
Here's what's happening: for any discrete random variable $X$, the expected value $\mathbb E[X]$ is $\sum i \mathbb P(X = i)$, as you noted. To take a particular example, consider the variable $X$ that can be $1$ with probability $1/2$, can be $2$ with probability $1/4$, and can be $5$ with probability $1/4$. I think you're already comfortable with the idea that for this variable,
$$\mathbb E[X] = 1 \cdot \frac 1 2 + 2 \cdot \frac 1 4 + 5 \cdot \frac 1 4 = \frac 9 4.$$
Now, let's consider another variable, $Y = X^2$. (That is, this variable is nothing more than the square of $X$, but I'm going to call it $Y$.) Just like you already know, we can find $\mathbb E[Y]$ as $\sum i \cdot \mathbb P(Y = i)$. But let's think carefully about what $Y$ can be here:
- In the case that $X = 1$, then $Y = 1$. This occurs with probability $1/2$.
- In the case that $X = 2$, then $Y = 4$. This occurs with probability $1/4$.
- In the case that $X = 5$, then $Y = 25$. This occurs with probability $1/4$.
So, we must have
$$\mathbb E[Y] = 1 \cdot \frac 1 2 + 4 \cdot \frac 1 4 + 25 \cdot \frac 1 4 = \frac{31}{4}$$
for the same reasons as above. Note that this turns out to be nothing more than just the last answer you pitched above, $\sum i^2 \mathbb P(X = i)$.
You're asking for intuition about why this formula works; my best attempt at that is outlined above. If you want to compute $\mathbb E[f(X)]$ for some function $f(X)$, then your task is to imagine $f(X)$ as some new random variable of its own right. Its distribution is "shaped" similarly to that of $X$ -- that is, it will possess the same probabilities, but just different values. (Specifically, it will be the values that $X$ can be, pushed through the function $f$.) Under that perspective, the logic for the right formula is hopefully clear.
For coin tosses, I think what you meant to say was
$E[X_i^2] = E[X_i]= \frac 1 2$
where the exponent is on the inside of the expectation. Here's what's going on there; we usually think of a coin flip as a variable $X_i$ that will be either $1$ or $0$, each with probability $1/2$. Under that scenario, note that $X_i^2 = X_i$, since $1^2 = 1$ and $0^2 = 0$. Thus,
$$\mathbb E[X_i^2] = 1^2 \cdot \frac 1 2 + 0^2 \cdot \frac 1 2 = \frac 1 2$$
which is the same calculation as we'd get for $\mathbb E[X_i]$.
Best Answer
Summing up the comment thread. Given a deterministic product of $n$ RVs, $Y_i$, provided they are independent, one has $$\mathbb{E}\left ( \prod_1^n Y_i\right)=\prod_1^n \mathbb{E}(Y_i)$$ If the number of factors itself is an RV, say $N$, and also independent of the $Y_i$ then one can use the law of total expextation to compute the mean: $$\mathbb{E}\left(\prod_1^N Y_i \right)=\mathbb{E}\mathbb{E}\left(\prod_1^n Y_i | N=n\right)$$ When the $Y_i$ are IID things simplify further in the usual way before taking the second expectation over $N=n$. In the posted image it appears this is what they are doing by first conditioning on $Z(\Delta t)=K$. Comment for further clarification or any potential errors/typos!