[Math] Maximum likelihood estimate of hypergeometric distribution parameter

Using the notation in the Wikipedia article on the hypergeometric distribution, I'm curious how one would obtain the maximum likelihood estimate for parameter $m$, the number of white marbles, given $T$ trials from the same urn. For convenience, I'll copy/paste the notation from the article:

Suppose you are to draw $n$ marbles without replacement from an urn containing $N$ marbles in total, $m$ of which are white. The hypergeometric distribution describes the distribution of the number of white marbles drawn from the urn, $k$.

Again, assuming I conduct $T$ trials, at each trial, I take $n$ balls from the urn, and $k_i$ is the number of white balls at trial $i$. Define $K = (k_1,\ldots,k_T)$. Then the likelihood function $L$:
$$L(m; K, N, n) = \prod_i^T \frac{\binom{m}{k_i}\binom{N-m}{n-k_i}}{\binom{N}{n}}$$

Taking a hint from this post, I first tried to solve the inequality:
$$L(m;K,N,n) \geq L(m-1;K,N,n)$$
when $T=1$. From this I obtained
$$m \leq \frac{Nk+k}{n}$$
so the MLE should be
$$m = \left\lfloor \frac{Nk+k}{n} \right\rfloor$$

Now, I'm stuck when I try to generalize to $T \geq 2$.

I first tried doing the same as above and I ended up with the following unwieldy inequality:
$$\prod_i^T \frac{m}{m-k_i} \geq \prod_i^T \frac{N-m+1}{N-m-n+k_i+1}$$
which I'm not sure how to solve.

Then I tried to take the log of the likelihood and differentiate as if $m$ were defined over positive reals and I ended up with an equally unwieldy equation to solve:
$$\sum_i^T \left(\Psi(m+1) – \Psi(m-k_i+1) – \Psi(N-m+1) + \Psi(N-m-n+k_i+1)\right) = 0$$
where $\Psi$ is the digamma function (i.e. the derivative of the log-gamma function).

My intuition tells me the solution to either of the above would look something like this:
$$m = \left\lfloor \frac{(N+1)\sum_i^T k_i}{Tn} \right\rfloor$$
but I have no idea how to get here.

The motivation for this problem is pure curiosity, since I've never seen a MLE for the hypergeometric distribution in terms of $m$.

Best Answer

Here is an approximate solution. The Poisson approximation to the hypergeometric disribution valid for $\frac{m}{N}<<1$ and $n>>1$, has the form:

$P(K = k|n, M, N) = \frac{exp(-\frac{nm}{N}) (\frac{nm}{N})^k}{k!}$

The likelihood function becomes

$L(m;n,N) = \frac{exp(-\frac{Tnm}{N}) (\frac{nm}{N})^{\sum_i^T k_i}}{\prod_i^T k_i!}$

which can be easily solved to obtain:

$ m = \frac{N\sum_i^T k_i}{Tn} $

Best Answer

Related Solutions

[Math] Weighted hypergeometric distribution

[Math] Maximum likelihood estimators, hypergeometric and binomial

Related Question