Solved – Binomial random variable conditional on another one

binomial distributionconditional probability

On the Wikipedia page for the Binomial distribution, the following property is mentioned (under the related distribution section): (paraphrased)

If $X\sim \text{Bin}(n,p)$ and $Y|X \sim \text{Bin}(X,q)$, then $Y \sim \text{Bin}(n,pq)$

I interpret this the following way. The probability mass function for $X$ is:
$$P(X=x) = \binom{n}{x}p^x(1-p)^{n-x}$$
The conditional mass function for $Y$ given $X=x$ is:
$$P(Y=y|X=x) = \binom{x}{y}q^{y} (1-q)^{x-y}$$
The mass function of $Y$ is:
$$P(Y=y) = \binom{n}{y} (pq)^y (1-pq)^{n-y}$$

There is no citation for this particular property. I have tried to prove it, but to no avail. I wrote the following R code to get a sense of the veracity of the claim.

# Observations of X & Y to be generated
obs <- 10000

n <- 10
p <- 0.6
q <- 0.4

X <- rbinom(obs, n, p)
Y <- X

for( i in 1:obs)
{
  Y[i] <- rbinom(1, X[i], q)
}

# Simulated pmf of Y 
hist(Y, breaks=obs)

# Theoretical/claimed pmf
Y_theoretical <- rbinom(obs, n, p*q)
hist(Y_theoretical, breaks=obs)

The two histograms generated are shown below:
(The simulated pmf)
Simulated distribution
(Claimed pmf)
Claimed distribution

Both seem identical for the choice of $p$ and $q$.

Can a proof of this claim be provided?

Best Answer

Let $X = \sum_{i=1}^{n} X_i$, with $X_i \overset{iid}{\sim} Bin(1, p)$, and $Z = \sum_{i=1}^{n} Z_i$, with $Z_i \overset{iid}{\sim} Bin(1, q)$. If all the $X_i$ and $Z_i$ are mutually independent, then $Z_i | X_i \overset{iid}{\sim} Bin(1, q)$.

Now to construct $Y$ we want to throw out all the $(X_i, Z_i)$ pairs where $X_i=0$ and then count the number of times $Z_i=1$ in the remaining pairs. That makes $Y | X \sim Bin(x, q)$. We can also write $Y = \sum_{i=1}^{n} Y_i$ with $Y_i = X_i Z_i$. We know $X_i Z_i=1$ if $X_i=1$ and $Z_i=1$, otherwise it is 0. Thus $Y_i \overset{iid}{\sim} Bin(1, pq)$, and $Y \sim Bin(n, pq)$.

Related Question