Estimating Lambda in a Poisson population where not all samples can be observed

estimationpoisson distributionprobability

Let $(x_1, x_2, \dots , x_n)$ be a random sample from a population which follows a Poisson distribution with an unknown mean $\lambda$. If we assume that $C$ is a known constant and we can only observe the values of the sample for which $x_i < C$.
I want to try to estimate $\lambda$ by only using these samples.

I first define two variables, $r$ and $p$, which can be defined as:

$$r = max(i: x_{(i)} < C)$$
$$p = max(i: x_{(i)} \le C-2)$$, where $x_{(i)}$ denotes the $i$th order statistic. I assume for convenience that $x_1, x_2, \dots, x_p,\dots, x_r$ are the observed samples so they are ordered.

So if $X$ is a Poisson distributed random variable with density function $p(x)$, mean $\lambda$ and $C$ as being any constant, then

$$\lambda = \sum_{x=0}^{\infty}\space x\space p(x)$$

can be split up to $$\lambda = \sum_{x=C}^{\infty}\space x\space p(x) + \sum_{x=0}^{C-1}\space x\space p(x)$$

I can calculate the first part directly:
$$\sum_{x=C}^{\infty}\space x\space p(x) = \lambda(1-F(C-2))$$, where $F(.)$ is the CDF of the Poisson distribution.

An estimation of the second part would be:

$$\frac{1}{n} \sum_{i=1}^r x_i$$ and if $\bar{x_r}$ is the mean of the first $r$ observations, we can write this as:

$$\frac{r}{n} \bar{x_r}$$

Now, we could estimate $F(C-2)$ as $\frac{p}{n}$

So combining the terms and working out for $\lambda$, I get:

$$\lambda = \frac{r\bar{x_r}}{p}$$

This seems to be a good estimator, but when $C$ becomes very small compared against the real mean, the estimation loses accuracy.

The reason seems to be that estimating $F(C-2)$ from the observed samples isn't that accurate when $C$ gets small compared to $\lambda$, even if I use a large sample size (>100K).

So the questions I'm thinking about:

  • Is there a more accurate way to estimate $F(C-2)$?
  • Or maybe there ís something wrong with the math? In which case, please point out.
  • Or maybe there is an easier way to estimate $\lambda$ from limited observed samples?

EDIT

I want to expand a bit based on the comments.

We can also say that $X$ follows a truncated Poisson distribution conditional on the event that $X < C$ with a known $C$, which is the truncation level.

If I read from the definition then I can write the PMF of a C-truncated Poisson distribution as $$\frac{p(x)}{F(C-1)}$$

If I then work out the log-likelihood function for $\lambda$, given the samples $x_1, x_2, \dots, x_p, \dots, x_r$, I get:

$$L(\lambda|x_1, x_2, \dots, x_p, \dots, x_r) = \log(\lambda)\sum_{i=1}^r x_i – r\log(\sum_{i=0}^{C-1} \frac{\lambda^i}{i!}) $$

Maximizing this function in $\lambda$ indeed gives me a good estimation for $\lambda$, but it seems that we always need a numerical method for it. If someone can elaborate more from this perspective, this is always welcome as well.

Best Answer

Here's the MLE derivation:

We have underlying r.v.s $X_1,X_2,\dots$ i.i.d $\text{Poisson($\lambda$)}$, but we only observe those $X_i$ for which $X_i\le k$ (letting $k=C-1$ for convenience). To better distinguish the observations from the possibly-unobserved r.v.s, I'll write the observations (say $r$ of them) as $Y_1,Y_2,\ldots,Y_r.$

The joint PMF of $Y_1,\ldots,Y_r$ is therefore

$$\begin{align}P(Y_1=y_1,...,Y_r=y_r) &=\prod_{i=1}^rP(Y_i=y_i)\\ &=\prod_{i=1}^rP(X_{j_i}=y_i\mid X_{j_i}\le k)\quad\text{for the corresponding $X_{j_i}$}\\[1ex] &=\prod_{i=1}^r{P(X_{j_i}=y_i)\over P(X_{j_i}\le k)}\\[1ex] &=\prod_{i=1}^r{e^{-\lambda}\lambda^{y_i}/y_i!\over F_\lambda(k)}\\[1ex] &\propto e^{-r\lambda}\lambda^{\sum_{i=1}^ry_i}[F_\lambda(k)]^{-r}\\[1ex] \end{align}$$

where $F_\lambda$ is the $\text{Poisson($\lambda$)}$ CDF. The log-likelihood function is therefore

$$-r\lambda +\left(\sum_{i=1}^ry_i\right)\log\lambda - r \log F_\lambda(k)\tag{1}$$

which is another way of writing the expression you wrote in your latest edit.

The MLE for $\lambda$ is then the value $\hat\lambda$ that maximizes (1), given the observations and the known value $k.$ This answer lists an R-program that accomplishes this numerically in the case when $k$ is unknown, using (in the present notation)

$$\hat{\lambda}_0 ={{\sum_{i=1}^r y_i}\over{\sum_{i=1}^r I(y_i<\hat{k})}}={\sum_{i=1}^r y_i\over\text{number of observations less than $\hat k$}} $$

as the starting value, where $\hat k=\max(y_1,...y_r)$ is an estimate of $k$. (They cite Moore (1952, Biometrika) for this.) Note that if the estimate $\hat k$ is replaced by our known value $k$, then this initial estimate $\hat \lambda_0$ is precisely the estimate you've proposed, i.e. ${r \bar y_r\over\text{number of observations less than $k$}}={r \bar y_r\over p}.$


Based on $10^5$ Monte Carlo trials with $\lambda=10,$ $k\in\{5,8,10,12,15\}$, and $r\in\{25, 50, 100\},$ it appears that $\hat\lambda_0$ has less bias but slightly more sampling variance that does the MLE. (I simply adapted the linked R program to the case of known $k$.) Here are typical results for $\lambda=10, r=50, k=5:$

histograms for lambda_0 and lambda_MLE

This case: $\text{est.}\mathbb E(\hat\lambda_0)=10.3$, $\text{est.}\sigma(\hat\lambda_0)=2.1,\quad$ $\text{est.}\mathbb E(\hat\lambda_{MLE})=10.7,$ $\text{est.}\sigma(\hat\lambda_{MLE})=2.0.$

Related Question