Probability – How to Estimate the Mean of a Poisson Distribution from Data

probabilityprobability distributionsstatistics

I have thought of three different approaches for estimating the mean for a Poisson, but I am not sure which one is the correct method to estimate it (the third one is documented separately at the end of the question).

For the sake of a concrete example, say that we want to find the Poisson distribution for the number of cars passing by in an hour (in front of our house or whatever).

Say that we want to estimate this by standing outside of hours house for $t$ hours and counting the number $n$ of cars we saw.

Then we could approximate the mean $\lambda$ as:

$$\lambda \approx \frac{n}{t}$$

where $\lambda$ is the mean number of cars that we see per hour.

That is first approach (which is the one I believe is the correct one).

(note: that I know the first one is easier to do in real life for the specific example, but I am not concerned with that, I am concerned with the mathematical correctness)

The second approach is the following approach.

Instead imagine that for some reason we are only allowed to record how long it takes us to see 1 single specific car. We record how long it took to see car i as $\tau_i$ (hours). Now we could estimate how many cars we see expect to see in 1 hour by doing:

$$ \lambda_i \approx \frac{1}{\tau_i}$$

[note that if $\tau_i < 1$, then we can have an mean value of seeing a car for an hour to be > 1]

So now, say that instead we choose to do this on independent days and we took k of these time periods $\tau_i$ and instead we estimated the "global" mean by doing a average of the means:

$$\lambda = \frac{1}{k}\sum^{k}_{i=1} \lambda_i = \frac{1}{k}\sum^{k}_{i=1} \frac{1}{\tau_i}$$

The second method might seem a little strange, but I was wondering if the two method where actually equivalent somehow, or if the second one was completely wrong and I why. The first one seems to be the correct one but I can't seem to "prove" to myself why my intuition says that.

[notice that the second method has an interesting property where we can instead of weighting all of them equally, we can do a weighted average to maybe insert the intuitive concept of which $\tau_i$ we trust more for our certain application. A little tangential to my original question, but an interesting thought…]


Bounty Section

I forgot to add this the first time I asked the question and thought it was important to add it now (since this was the reason my question came up in the first place!).

I have a different method for estimating the mean and was wondering if it was correct.

Instead of waiting outside for t minutes, what if you did the following.

You waited outside and recored how much time it took to see 1 car. Let $\tau_i$ be amount of time you waited to see the ith car. However, notice that after you see a car, you stop your stop-watch and later (maybe on another day), you restart your watch waiting to see the next occurrence of a single car (otherwise, if you you just stop your stop-watch and re-start it immediately, its just the same as the original MLE estimator I was asking about), and you obviously repeat this a but of times. In fact, assume you do this $n$ times (i.e. you see n cars and record how long it took to see each one). Then instead of doing my previous method of $\frac{1}{\tau_i}$, you instead try to do something similar to the first maximum likelihood method by doing the following:

$$\lambda \approx \frac{n}{t} = \frac{n}{\sum^{n}_{i=1} \tau_i}$$

where t is the total time it took you to see n cars. But this time these cars were seen by n independent "samples". It feels that this method might not be correct but I was not sure. Is there something about necessarily having the total time interval t happen in one consecutive time interval?

Best Answer

I'll start by commenting on your second approach. Since your observation is a Poisson process, then the time $\tau_1$ that you have to wait to observe the first car follows an exponential distribution $\tau_1\sim\mathrm{Exp}(\lambda)$, where $\lambda$ is the intensity of the Poisson process.

Since $\tau_1\sim\mathrm{Exp}(\lambda)$, then it indeed holds that

$$\mathbb{E}[\tau_1]=\frac{1}{\lambda}.$$

However, estimating $\lambda$ by $1/\tau_1$ leads to some problems since the estimator is not even unbiased. Indeed,

$$\mathbb{E}\left[\frac{1}{\tau_1}\right]=+\infty,$$

which does not conform to your intuition that $\mathbb{E}[1/\tau_1]=\lambda$.

Now, your second estimator is a more natural one which is known as the maximum likelihood estimator (MLE) in statistics. Your idea is to estimate $\lambda$ by

$$\widehat{\lambda}_1=\frac{N_t}{t},$$

where $N_t$ is the number of cars that you see in a time interval of length $t$. In this case,

$$\mathbb{E}[\widehat{\lambda}_1]=\frac{1}{t}\mathbb{E}[N_t]=\frac{1}{t}\lambda t=\lambda.$$

Lastly, note that your idea of doing many estimations and taking an average can also be applied in this case. You may count the number of cars that arrive each day in $t$ hours, and denote this number by $n_i$ for day $i$. Then, you may estimate $\lambda$ by

$$\widehat{\lambda}_2=\frac{1}{k}\sum_{i=1}^k\frac{n_i}{t},$$

and this estimator is indeed very natural.


Bounty section:

Let me just formalize slightly your answer. Assume that you start observing at $T_0$ and the cars arrive at the times $T_1<T_2<T_3<\cdots$. Denote by $\tau_i$ the time it takes to see the next car after car $i$ goes by (Note: as explained in the third section, this has the same distribution as the time you have to wait to see a car go by, starting at any time $t$). With these new notations, this means that $\tau_1=T_1-T_0$ and for $i>1$, $\tau_i=T_i-T_{i-1}$.

Since the arrival times $T_1<T_2<\cdots$ form a Poisson process of intensity $\lambda$, then the following properties hold:

  • $N_t\sim\mathrm{Poiss}(\lambda t)$, or in other words, the number of cars that arrive in an interval of length $t$ has a Poisson distribution of parameter $\lambda t$;
  • for any $i\in\mathbb N$, $\tau_i\sim\mathrm{Exp}(\lambda)$, i.e. the time between the arrival times of two cars is distributed as an exponential of parameter $\lambda$;
  • for any $i\in\mathbb N$, $T_i\sim\mathrm{Gamma}(n,\lambda)$, i.e. the arrival time of car number $i$ is distributed as a Gamma random variable of parameters $n$ and $\lambda$.

So in fact, $\sum_{i=1}^n\tau_i=T_n-T_0$ represents the time it takes for $n$ cars to go by, when starting the observation at a time $T_0$. Now, there are two points I'd like to make. First, note that the $\tau_i$ are independent samples from an exponential distribution. Thus, by the strong law of large numbers,

$$ \frac1n\sum_{i=1}^n\tau_i\xrightarrow[n\rightarrow+\infty]{}E[\tau_1]=\frac1\lambda. $$

Hence, since $\lambda>0$, your estimator tends almost surely to $\lambda$ as $n$ goes to infinity:

$$ \widehat\lambda_3=\frac{n}{\sum_{i=1}^n\tau_i}\xrightarrow[n\rightarrow+\infty]{}\lambda. $$

Second, since $T_n$ is a sum of $n$ independent exponential random variables, then $T_n\sim\mathrm{Gamma}(n,\lambda)$. That is, the probability density function of $T_n$ is given by

$$ f_n(x)=\frac{x^{n-1}}{\Gamma(n)}\lambda^ne^{-\lambda x}\mathbb 1_{(0,+\infty)}(x). $$

Hence, you may calculate the expectation of your estimator:

$$ \mathbb E\left[\widehat\lambda_3\right]=n\int_0^\infty\frac{x^{n-2}}{\Gamma(n)}\lambda^ne^{-\lambda x}\,\mathrm dx. $$

As seen previously, the integral diverges for $n=1$. For $n\ge2$ however, you can compute the integral as

$$ \mathbb E\left[\widehat\lambda_3\right]=n\frac{\lambda\Gamma(n-1)}{\Gamma(n)}\underbrace{\int_0^\infty\frac{x^{n-2}}{\Gamma(n-1)}\lambda^{n-1}e^{-\lambda x}\,\mathrm dx}_{=1}=\frac n{n-1}\lambda. $$

Therefore, it seems to be wiser to define

$$ \widehat\lambda_4=\frac{n-1}{\sum_{i=1}^n\tau_i}, $$

for $n\ge2$. This estimator will still converge to $\lambda$ almost surely, but will additionally be such that $\mathbb E\left[\widehat\lambda_4\right]=\lambda$.

In other words, $\widehat\lambda_4$ is consistent and unbiased.


Clarifying some points:

In your edit, you say "otherwise, if you you just stop your stop-watch and re-start it immediately, its just the same as the original MLE estimator I was asking about". This is not true. If you do this $n$ times, then you will wait a time that is distributed as a $\Gamma$ distribution, as mentioned previously. The difference is that for the original MLE estimator, you do this for a period of $t$ instead of counting $n$ cars. As you can see, both methods yield very different results.

You also mention that you want to stop your stop watch, and restart it at a later time instead of straight away.

This does not change anything since the exponential distributions are memoryless. Indeed, let us assume that you observe the first car, and stop your stop-watch. Then, you enable it at a later time $t$. Let's say that $T_i\le t<T_{i+1}$, i.e. you enable your stop-watch between car $i$ and $i+1$.

Well, you can in fact compute the distribution of $T_{i+1}-t$ (i.e. the time you wait until the next car) and it is $\mathrm{Exp}(\lambda)$. This is known as the inspection paradox and could be unintuitive at first sight. It is a result of the memoryless property of exponential random variables.

So to summarize, whenever you activate your stop-watch, the time waited $\tau_i$ will always be an exponential distribution of parameter $\lambda$. Thus, $\sum_{i=1}^n\tau_i$ is indeed a $\Gamma(n,\lambda)$ since the $\tau_i$ are independent.

Related Question