[Math] Fisher Information for Exponential RV

statistical-inferencestatistics

Let $X \sim exp(\lambda_0)$; i.e, an exponential random variable with true parameter $\lambda_0 > 0$. The density is then $f(x;\lambda_0) = \lambda_0 e^{-\lambda_0 x}$. For a given $\lambda > 0$, the Fisher information is defined as
\begin{align*}
I(\lambda) & := E\left( \left(\frac{\partial \log f(X; \lambda)}{\partial \lambda}\right)^2\right) \\
& = \int_0^\infty \left(\frac{\partial \log f(x; \lambda)}{\partial \lambda}\right)^2 \, f(x; \lambda) \, dx \\
& = \int_0^\infty \left(\frac{1}{\lambda^2} – \frac{2x}{\lambda} + x^2\right) \, \lambda e^{-\lambda x} \, dx \\
& = \frac{1}{\lambda^2}.
\end{align*}

Here's a plot of $I(\lambda)$:

Fisher Info

What exactly is the Fisher information telling me? As I understand it, the larger the Fisher information, the "more information" the random variable $X$ is giving me about my MLE estimate of $\lambda$. How am I supposed to use this here? I guess if my MLE estimate is $\hat{\lambda} = 0.1$, then $I(0.1) = 100$. Is this good? Is this the correct usage of Fisher information? But, I don't see how the actual value of the random variable $X$ affects this at all, nor do I see how the true parameter $\lambda_0$ affects this. Have I misinterpreted Fisher information?

Best Answer

You're right to say that the actual realization of the random variable $X$ does not affect the (true and unknown since it does depend on the true parameter) Fisher information since in the definition we integrate over the density of $X$.

The Fisher information is the 2nd moment of the MLE score. Intuitively, it gives an idea of how sensitive the score reacts to different random draws of the data. The more sensitive this reaction is, the fewer draws (or observations) are needed to get a good estimate or to test an hypothesis. To see why, look at how we set the score vector equal to zero in order to get the MLE. For your example, have a look at this foc. The MLE of $\lambda$ depends inversely on the observations. Since a small $\lambda$ implies a large variance of the $X$ itself, being positive, a few observations are likely to result in a good estimate if $\lambda$ is small.

Related Question