In actuarial science, this is the basic form of a collective risk model. There is a discrete distribution from which an annual loss frequency is drawn, and each of those losses has a severity drawn from some continuous distribution. The sum total of those losses is the expected aggregate loss the insurer/reinsurer/homeowner/etc. expects.
Somewhat more formally, if $N$ is a discrete random variable, and $X$ is a continuous random variable, we can define $S$ as the aggregate loss random variable where $S = X_1 + X_2 + \ldots + X_N, X \in {0, 1, 2, \ldots}$.
Given some basic independence assumptions, simply the $X's$ (jointly or singly) do not depend on $N$ and vice versa, it can be shown that:
$$
E(S) = E(N)E(X)\\
Var(S) = E(N)Var(X) + Var(N)E(X)^2
$$
The proof is based on convolution of the probability generating functions. See (Klugman et al. 1998, pp.295–298).
As soakley showed below, the first expectation is relatively simple to derive. The second can be understood knowing the law of total variance which is:
$$
Var(X) = E_Y[Var(X|Y)] + Var_Y[E(X|Y)]
$$
or, in English, the total variance is the expected value of the conditional variance plus the variance of the conditional expected value. See (Heckman & Meyer 1989, p.30) for this derivation.
Adding derivation of Variance (edit)
Given the law of total probability, we can say that $Var(S) = E[Var(S|N)] + Var[E(S|N)]$. Now, $S|N$ is merely $X_1 + X_2 + \ldots + X_N$, and we are given the value of $n$. So let's rewrite.
$$
Var(S) = E[Var(X_1 + X_2 + \ldots + X_N)] + Var[E(X_1 + X_2 + \ldots + X_N)]
$$
One of our assumptions in compound variance, which you also mentioned in your question, is that the X's are iid. Therefore, in the first term, $Var(X_1 + X_2 + \ldots + X_N)$ collapses to $N\cdot Var(X)$. Similarly, the second term $E(X_1 + X_2 + \ldots + X_N)$ simply becomes $N*E(X)$, so we now have:
$$
Var(S) = E[N\cdot Var(X)] + Var(N\cdot E(X)]
$$
To complete the conditioning, we need to take the expectations and variances over $N$. But the $X$'s are constants with respect to $N$, so we get:
$$
Var(S) = E[N]Var(X) + Var(N)E[X]^2
$$
since the $E(X)$ component in the variance comes outside squared and the $Var(X)$ component in the expectation comes out directly, which demonstrates the second relationship.
Heckman, Philip E., and Glenn G. Meyers. "The calculation of aggregate
loss distributions from claim severity and claim count distributions."
Proceedings of the Casualty Actuarial Society. Vol. 70. 1983.
Klugman, S. A.; Panjer, H. H. & Willmot, G. E. Loss models: from data
to decisions John Wiley & Sons, Inc., 1998
If $f(x)\,dx$ is a probability distribution with expected value $0$ and variance $1$, and the distribution of $X_i$ is
$$f\left(\dfrac{x-\mu_i}{\sigma_i}\right)\cdot\dfrac{dx}{\sigma_i},$$
and $X_i$ are independent, then certainly the distribution of $X_1+\cdots+X_n$ has expected value $\mu_1+\cdots+\mu_n$ and variance $\sigma_1^2+\cdots+\sigma_n^2$. Also, the higher cumulants would add together in the same way. (The fourth cumulant, for example, is $\mathbb E((X-\mu)^4) - 3(\mathbb E((X-\mu)^2))^2$, and the coefficient $3$ is the only number that makes this functional additive in the sense that the fourth cumulant of a sum of independent random variables is the sum of their fourth cumulants.)
We've tacitly assumed $\sigma_i<\infty$. I think if $\sum_{i=1}^\infty\sigma_i^2=\infty$, then as $n$ grows, the distribution would approach a normal distribution (I'm not recalling the appropriate generalization of the central limit theorem clearly enough to state it precisely.) But what happens for small $n$ is another question, and the answer would depend on what function $f$ is.
I said above that $f(x)\,dx$ has expectation $0$ and variance $1$. But one can also have perfectly good location-scale families in which the expectation, and a fortiori, the variance, do not exist. The most well-known case is the Cauchy distribution. The simplest result there is that $(X_1+\cdots+X_n)/n$ actually has the same Cauchy distribution as $X_1$ if these $n$ variables are i.i.d. It doesn't get narrower. So a lot depends on which function $f$ is.
Best Answer
That expression is usually false for $n \gt 1$.
For example if $W$ is independent of all the $X_i$ and if $P(W)=p$ then the left hand side is $p$ while the right hand side is $p^n$, which is rather smaller.