Expectation calculation of hypergeometric distribution with sampling without replacement

expected valueprobabilityprobability theorysampling

Given a population of $n$ black and $m$ green balls, the expected number of black balls in a random sample of $r$ balls can be calculated as follows

We define a random variable $X_k$ taking values 1 or according as the $k$-th element in the sample is black or not. Then $P(X_k = 1) = \frac{n}{n+m}$ and the expected number of black balls in a random sample of $r$ balls is $\frac{nr}{n+m}$.  This is the answer given in Feller's Book (Vol 1). The reason given by Feller is "For reasons of symmetry".

But I am not convinced by this approach. How can we assume a constant probability $P(X_k = 1) = \frac{n}{n+m}$ ?  The sampling process is without replacement and probability of success changes in every trial. I don't understand what is meant by "For reasons of symmetry".

Best Answer

Let $S_k:=\sum_{i=1}^k X_i$. Then for $k<n$, \begin{align} \mathsf{P}(X_{k+1}=1)&=\sum_{i=0}^k\mathsf{P}(S_k=i)\mathsf{P}(X_{k+1}=1\mid S_k=i) \\ &=\sum_{i=0}^k\frac{\binom{n}{i}\binom{m}{k-i}}{\binom{n+m}{k}}\times\frac{n-i}{n+m-k} \\ &=\frac{n}{n+m}\sum_{i=0}^k\frac{\binom{n-1}{i}\binom{m}{k-i}}{\binom{n+m-1}{k}}=\frac{n}{n+m}. \end{align}

Related Question