How to represent the sampling distribution of a random variable which has probability $\rho$ of being present

binomial distributionexpected valueprobabilitysampling

Say we have a continuous random variable $S\in [0,1]$ which has an unknown probability distribution $p_s$. Now suppose we want to find the expected value (and distribution) of $N$ trials where in each trial we draw from $S$ with probability $\rho$ and we get $0$ with probability $1-\rho$ (we want the expected value just in terms of $E[S]$).

In other words, in each trial we have probability $\rho$ of getting something, and that something is a random variable $S$ distributed according to $p_s$. The rest of the time we get nothing at all.

We can model the number of trials which return something as a standard binomial distribution. That is, out of $N$ trials the number of times we draw from $S$ is modeled by $B(N, \rho)$ where $B$ is the standard binomial distribution.

If $S$ were always 1, we would be done. The sampling distribution would just be $B(N, \rho)$ and the expected value would be $N\rho$ (correct me if I am wrong). But what about this case where $S$ has some unknown distribution? How do we model this sampling distribution?

Best Answer

Let $Y$ be the outcome of one trial. The distribution of $Y$ is $(1-\rho) \cdot \delta_0 + \rho \cdot p_s$, where $\delta_0$ is a point mass at $0$. Then after $N$ (independent, I'm assuming) trials the total outcome is $T_N := Y_1 + \dots + Y_N$, where $Y_1, \dots, Y_N$ are iid copies of $Y$.

I'm not sure exactly what you mean by "model this distribution", but in particular the expectation is easy to calculate: $\mathbb{E}(Y) = \rho \mathbb{E}(S)$, so $\mathbb{E}(T_N) = N \rho \mathbb{E}(S)$.

It's also possible to calculate the variance using the law of total variance. Let $I$ be the random variable that indicates whether $0$ is chosen or sampling from $p_s$ is chosen. Then \begin{align*} \operatorname{var}(Y) &= \mathbb{E}(\operatorname{var}(Y|I)) + \operatorname{var}(\mathbb{E}(Y|I)) = \mathbb{E}((1-\rho) \delta_0 + \rho \delta_{\operatorname{var}(S)}) + \operatorname{var}((1-\rho) \delta_0 + \rho \delta_{\mathbb{E}(S)}) \\ &= \rho \operatorname{var}(S) + \rho \mathbb{E}(S)^2 - \rho^2 \mathbb{E}(S)^2, \end{align*} and $\operatorname{var}(T_N) = N\cdot \operatorname{var}(Y)$.

EDIT: Here is an alternate way to calculate the same quantities. As you have already pointed out, we can also express $T_N$ as $\sum_{i=1}^{M} S_i$ where $M \sim \operatorname{Bin}(N,\rho)$ and the $S_i$ are iid copies of $S$. Then $\mathbb{E}(T_N)$ and $\operatorname{var}(T_N)$ can be calculated using standard results for sums of a random number of iid random variables (see for example these lecture notes)

Related Question