Probability – Derivation of Mean and Variance of Hypergeometric Distribution

hypergeometric functionmeansprobability distributions

I need clarified and detailed derivation of mean and variance of a hyper-geometric distribution.

If a box contains $N$ balls, $a$ of them are black and $N-a$ are white, and $n$ number of balls are drawn at random without replacement , then the probability of getting $x$ black balls (and obviously $n-x$ white balls) is given by the following p.m.f.

The p.m.f is $$f(x) =\frac{(_{a}C_x) \cdot (_{N-a}C_{n-x})}{_{N}C_n} $$

The mean is given by: $$ \mu = E(x) = np = na/N$$
and, variance $$ \sigma^2 = E(x^2)+E(x)^2 = \frac{na(N-a)(N-n)}{N^2(N^2-1)} = npq \left[\frac{N-n}{N-1}\right] $$
where $$ q = 1-p = (N-a)/N$$

I want the step by step procedure to derive the mean and variance. Thank you.

Best Answer

This is a rather old question but it is worth revisiting this computation. Let $$\Pr[X = x] = \frac{\binom{m}{x} \binom{N-m}{n-x}}{\binom{N}{n}},$$ where I have used $m$ instead of $a$. We can ignore the details of specifying the support if we use the conventions on binomial coefficients that evaluate to zero; e.g., $\binom{n}{k} = 0$ if $k \not\in \{0, \ldots, n\}$. Then we observe the identity $$x \binom{m}{x} = \frac{m!}{(x-1)!(m-x)!} = \frac{m(m-1)!}{(x-1)!((m-1)-(x-1))!} = m \binom{m-1}{x-1},$$ whenever both binomial coefficients exist. Thus $$x \Pr[X = x] = m \frac{\binom{m-1}{x-1} \binom{(N-1)-(m-1)}{(n-1)-(x-1)}}{\frac{N}{n}\binom{N-1}{n-1}},$$ and we see that $$\operatorname{E}[X] = \frac{mn}{N} \sum_x \frac{\binom{m-1}{x-1} \binom{(N-1)-(m-1)}{(n-1)-(x-1)}}{\binom{N-1}{n-1}},$$ and the sum is simply the sum of probabilities for a hypergeometric distribution with parameters $N-1$, $m-1$, $n-1$ and is equal to $1$. Therefore, the expectation is $\operatorname{E}[X] = mn/N$. To get the second moment, consider $$x(x-1)\binom{m}{x} = m(x-1)\binom{m-1}{x-1} = m(m-1) \binom{m-2}{x-2},$$ which is just an iteration of the first identity we used. Consequently $$x(x-1)\Pr[X = x] = \frac{m(m-1)\binom{m-2}{x-2}\binom{(N-2)-(m-2)}{(n-2)-(x-2)}}{\frac{N(N-1)}{n(n-1)}\binom{N-2}{n-2}},$$ and again by the same reasoning, we find $$\operatorname{E}[X(X-1)] = \frac{m(m-1)n(n-1)}{N(N-1)}.$$ It is now quite easy to see that the "factorial moment" $$\operatorname{E}[X(X-1)\ldots(X-k+1)] = \prod_{j=0}^{k-1} \frac{(m-j)(n-j)}{N-j}.$$ In fact, we can write this in terms of binomial coefficients as well: $$\operatorname{E}\left[\binom{X}{k}\right] = \frac{\binom{m}{k} \binom{n}{k}}{\binom{N}{k}}.$$ This gives us a way to recover raw and central moments; e.g., $$\operatorname{Var}[X] = \operatorname{E}[X^2] - \operatorname{E}[X]^2 = \operatorname{E}[X(X-1) + X] - \operatorname{E}[X]^2 = \operatorname{E}[X(X-1)] + \operatorname{E}[X](1-\operatorname{E}[X]),$$ so $$\operatorname{Var}[X] = \frac{m(m-1)n(n-1)}{N(N-1)} + \frac{mn}{N}\left(1 - \frac{mn}{N}\right) = \frac{mn(N-m)(N-n)}{N^2 (N-1)},$$ for example. What is nice about the above derivation is that the formula for the expectation of $\binom{X}{k}$ is very simple to remember.

Related Question