[Math] Maximum likelihood estimate of hypergeometric distribution parameter

probabilityprobability distributions

Using the notation in the Wikipedia article on the hypergeometric distribution, I'm curious how one would obtain the maximum likelihood estimate for parameter $m$, the number of white marbles, given $T$ trials from the same urn. For convenience, I'll copy/paste the notation from the article:

Suppose you are to draw $n$ marbles without replacement from an urn containing $N$ marbles in total, $m$ of which are white. The hypergeometric distribution describes the distribution of the number of white marbles drawn from the urn, $k$.

Again, assuming I conduct $T$ trials, at each trial, I take $n$ balls from the urn, and $k_i$ is the number of white balls at trial $i$. Define $K = (k_1,\ldots,k_T)$. Then the likelihood function $L$:
$$L(m; K, N, n) = \prod_i^T \frac{\binom{m}{k_i}\binom{N-m}{n-k_i}}{\binom{N}{n}}$$

Taking a hint from this post, I first tried to solve the inequality:
$$L(m;K,N,n) \geq L(m-1;K,N,n)$$
when $T=1$. From this I obtained
$$m \leq \frac{Nk+k}{n}$$
so the MLE should be
$$m = \left\lfloor \frac{Nk+k}{n} \right\rfloor$$

Now, I'm stuck when I try to generalize to $T \geq 2$.

I first tried doing the same as above and I ended up with the following unwieldy inequality:
$$\prod_i^T \frac{m}{m-k_i} \geq \prod_i^T \frac{N-m+1}{N-m-n+k_i+1}$$
which I'm not sure how to solve.

Then I tried to take the log of the likelihood and differentiate as if $m$ were defined over positive reals and I ended up with an equally unwieldy equation to solve:
$$\sum_i^T \left(\Psi(m+1) – \Psi(m-k_i+1) – \Psi(N-m+1) + \Psi(N-m-n+k_i+1)\right) = 0$$
where $\Psi$ is the digamma function (i.e. the derivative of the log-gamma function).

My intuition tells me the solution to either of the above would look something like this:
$$m = \left\lfloor \frac{(N+1)\sum_i^T k_i}{Tn} \right\rfloor$$
but I have no idea how to get here.

The motivation for this problem is pure curiosity, since I've never seen a MLE for the hypergeometric distribution in terms of $m$.

Best Answer

Here is an approximate solution. The Poisson approximation to the hypergeometric disribution valid for $\frac{m}{N}<<1$ and $n>>1$, has the form:

$P(K = k|n, M, N) = \frac{exp(-\frac{nm}{N}) (\frac{nm}{N})^k}{k!}$

The likelihood function becomes

$L(m;n,N) = \frac{exp(-\frac{Tnm}{N}) (\frac{nm}{N})^{\sum_i^T k_i}}{\prod_i^T k_i!}$

which can be easily solved to obtain:

$ m = \frac{N\sum_i^T k_i}{Tn} $

Related Question