(Kind Of) Maximising the Variance of a Hypergeometric Distribution

discrete mathematicsoptimizationprobability distributionssamplingstatistics

I am playing around with a hypergeometric distribution.

Consider an urn with $N$ balls with $R$ red balls and $B$ blue balls. Where $1\leq n\leq N$ is a sample size, and $m\in(0,1]$ a margin of error,

is it possible to find for a fixed $(N,m)$, a $B_{(N,m)}$ such that:

$$\sum_{k=\left\lceil\frac{Bn}{N}-mn\right\rceil}^{\left\lfloor\frac{Bn}{N}+mn\right\rfloor}\frac{\binom{B}{k}\binom{N-B}{n-k}}{\binom{N}{n}}\leq \sum_{k=\left\lceil\frac{B_{(N,m)}n}{N}-mn\right\rceil}^{\left\lfloor\frac{B_{(N,m)}n}{N}+mn\right\rfloor}\dfrac{\binom{B_{(N,m)}}{k}\binom{N-B_{(N,m)}}{n-k}}{\binom{N}{n}}?\qquad
(\star)$$

Given a small population $N$, margin of error $m$, and confidence level $C$, I am trying to come up with a minimum sample size if we want to estimate $p:=\frac{B}{N}$ using samples of size $n$.

Although, where $\widehat{B_n}$ is a random variable, the number of blue balls in a random sample of size $n$,
$$\widehat{p_n}:=\frac{B_n}{n}\approx \frac{B}{N}=p,$$

to come up with a confidence interval I need to find the smallest $n_*$ such that for all $n\geq n_*$:

$$\mathbb{P}[|p-\widehat{p_n}|<m]\geq C,$$

where the probability is the sum on the left of ($\star$).

A priori, we do not know the value of $B$, and so we need to to take a worst case scenario.

In the large population approximation, this worst case is $p=0.5$, and we can proceed away… but things seem far more complicated here.

I am not interested in normal, binomial, or normal+(small population)correction approaches.

Best Answer

It follows from a very simple symmetry argument that the answer is $B=N/2$.

Related Question