[Math] Poisson distribution in maximum likelihood estimator

maximum likelihoodpoisson distributionstatistics

Let $x\sim poisson(\lambda )$ and $P(x=k)=\frac{\lambda^k}{k!}e^{-\lambda}$. Let $D=\{x_1,x_2,x_3,\ldots,x_N\}$ a set of data. I want to find a maximum likelihood estomator $\hat \theta$ S.T. $\hat \theta_{MLE}=\arg \max\{P(\lambda |x)\}$.

$$\hat \theta_{MLE}=\arg \max_\lambda \{P(\lambda |x)\}=\arg \max_\lambda \{P(x=k |\lambda)\}$$

Is this a good approach?

Why is this correct:
$$P(x=k |\lambda)=\prod_{i=1}^N \frac{\lambda^{k_i}}{k_i!}e^{-\lambda}$$
Why not just:
$$P(x=k |\lambda)= \frac{\lambda^{k}}{k!}e^{-\lambda}$$

Best Answer

Note that your second expression is just a special case of the first expression, where $n=1$. Hence it is sufficient to analyse your first assertion for a general $n\geq 1$ and see what happens in the case $n=1$


If you just look at a single observation (i.e $X_1$) instead of all observations (i.e. $X_1,...,X_n$) you are obviously discarding a lot of information which could be needed to estimate something unkown more precisely.

Suppose you are given an i.i.d sample $X_1,\dots,X_n$, $n\geq 1$, which are all sampled from a $Poisson(\lambda)$-distribution, with $\lambda$ unknown. Their joint density would be: \begin{align*} P(X_1=k_1, \dots, X_n=k_n)& = P(X_1=k_1) \cdot P(X_2=k_2) \cdot \ldots \cdot P(X_n=k_n)\\ & =\prod_{i=1}^n\frac{\lambda^{k_i}}{{k_i}!} \exp(-\lambda) \end{align*}

which depends on the unknown parameter $\lambda$.

The idea of maximum likelihood is to look at the joint density function as a function of the unknown parameter $\lambda$ and maximize this target over all possible values of $\lambda$.

To better understand why we should use the joint density and not the "marginal" density of single observation we have to take a look at the result.

It is well known that the maximum likelihood estimator in the current case is $\widehat{\lambda}_n = \frac{\sum_{i=1}^nX_i}{n}.$

But note, we have (since the $X_1,\dots, X_n$ are i.i.d): $$E(\widehat{\lambda}_n) = \lambda$$ as well as $$Var(\widehat{\lambda}_n) = \frac{\lambda}{n}.$$

From this it is clear that $\widehat{\lambda}_n$ is an unbiased estimate for $\lambda$ for all $n$ (since $E(\widehat{\lambda}_n)$ does not depend on $n$) but the variance of this estimator will decrease with the sample size. Hence using all $n$ observations from the sample and not only a single one (i.e. $n=1$) will lead to a "better/more precise" estimator! (this tells you: don't maximize your second assertion, since you can do better by maximizing your first assertion!)

It turns out that throwing away information (looking at $n=1$ instead of $n>1$) is not a good idea. This is very often the case in statistics.

Related Question