Limit of hypergeometric distribution when sample size grows with population size

probability distributionsprobability theory

Consider choosing $Mn/6$ balls from a population consisting of $M$ balls of each of $n$ colors (so $Mn$ balls in total). So the density function of the sample is given by a multivariate hypergeometric distribution: $$f(x_1,\ldots, x_n) = \frac{\binom{M}{x_1}\cdots\binom{M}{x_n}}{\binom{Mn}{Mn/6}}.$$ Can one say anything about the limiting behavior of the distribution as $M\to\infty$, where the number of colors $n$ is fixed? Since the sample size grows at the same rate as the population size, this wouldn't converge to a binomial/multinomial distribution as it would if the sample size were fixed. Any help is appreciated! (The $1/6$ in $Mn/6$ is arbitrary, I'm just curious in general about the case where the sample size is always a fixed fraction of the population size).

I guess it wouldn't surprise me if nothing really useful can be said, in which case I have a related question. Suppose you consider the same scenario, but instead of starting off with $M$ balls of each color, we only started off with, say, $5M/6$ balls of each color. So the modified density function would be: $$g(x_1,\ldots, x_n) = \frac{\binom{5M/6}{x_1}\cdots\binom{5M/6}{x_n}}{\binom{5Mn/6}{Mn/6}}.$$ As $M\to\infty$, is there any meaningful relationship between $f$ and $g$ that can be made? It vaguely seems to me like as $M$ grows large the two densities should look more and more alike, but it's possible that that intuition is awry.

Best Answer

For the $m^{th}$ ball of color $n$ let $X_{m}^{n}$ be the indicator random variable for whether it was drawn. Suppose we are drawing fraction $\mu \in (0,1)$ of the balls in the population (e.g. $\mu = 1/6$), then:

$$\mathbb{E}[X_{m}^{n}] = \mu$$

$$Var(X_{m}^{n}) = \mu(1-\mu) \equiv \sigma^{2}$$

For any $(m,n) \neq (m',n')$:

$$\begin{align} Cov(X^{n}_{m}, X^{n'}_{m'}) &= \mathbb{E}[X_{m}^{n}X_{m'}^{n'}]-\mu^{2} \\ &= -\mu (1-\mu)/(MN-1) \\ &= -\sigma^{2}/(MN-1) \end{align}$$

Fixing $N$, for any $M$ denote: $$\bar{X}^{n}_{M} = \frac{1}{M}\sum_{m=1}^{M} X_{m}^{n}$$ Which has the following properties: $$\mathbb{E}[\bar{X}^{n}_{M}] = \mu$$

$$\begin{align} Var(\bar{X}^{n}_{M}) &= \frac{1}{M^{2}} \left[ M Var(X_{m}^{n}) + M(M-1)Cov(X_{m}^{n}) \right] \\ &= \frac{1}{M} \left[ Var(X_{m}^{n}) + (M-1)Cov(X_{m}^{n}) \right] \\ &= \frac{1}{M} \left[ \sigma^{2} - (M-1)\sigma^{2}/(MN-1) \right] \\ &= \frac{\sigma^{2}}{M}\left( \frac{M(N-1)}{MN-1} \right) \end{align}$$

Define $Y^{n}_{M} = \sqrt{M}(\bar{X}^{n}_{M} - \mu)$, then by the central limit theorem $Y^{n}_{M}$ converges in distribution to $N(0, \sigma^{2}(N-1)/N)$. (Note the central limit theorem still applies here though the random variables are slightly dependent. Cite Theorem 1 of "The Central Limit Theorem For Dependent Random Variables" by Wassily Hoeffding and Herbert Robbins.)

The covariance for $n \neq n'$ is:

$$Cov(\bar{X}^{n}_{M}, \bar{X}^{n'}_{M}) = Cov(X^{n}_{m}, X^{n'}_{m'}) = -\sigma^{2}/(MN-1)$$

$$\Rightarrow Cov(Y^{n}_{M}, Y^{n'}_{M}) = M\sigma^{2}/(MN-1) \rightarrow -\sigma^{2}/(N-1)$$

Thus, $(Y^{1}_{M}, \ldots , Y^{N}_{M})$ converges in distribution to a multivariate normal centered around $0$ with a covariance matrix that has $\sigma^{2}(N-1)/N$ on the diagonal and $-\sigma^{2}/(N-1)$ on the off-diagonal. (Note, this covariance matrix has rank $N-1$.)

(To prove $(Y^{1}_{M}, \ldots , Y^{N}_{M})$ does indeed converge to a multivariate normal, we would have to show any linear combination of them converges to a normal, which follows via the same argument used to show $Y^{n}_{M}$ converges to a normal.)

Related Question