[Math] Intuition behind Fisher information and expected value

statistics

I am learning stats. On page 128 of my book, All of Statistics 1e, it explains that the Fisher information is the variance of the score function. It then goes on to say that when $n = 1$

$$I(\theta) = -E_{\theta}\left(\frac{\partial^2 \log\space f(X;\theta)}{\partial{\theta}^2}\right)$$

where $f(x;\theta)$ is, I think, the pdf with parameters $\theta$.

I am trying to get an intuition for what that definition is saying. Why would the variance of the score function be equal to the opposite of the expected value of the partial derivative of the pdf with respect to theta? In googling around I've found some videos for the Cramer-Rao lower bound and that seems related. I'm way over my head mathematically (in part to build up my skills), so it would be great if someone could really break down what is going on.

Best Answer

The fisher information only has a precise meaning when you are dealing with a normally distributed value. In that case, the log likelihood function will be parabolic, and the fisher information will equal the curvature at the MLE. It turns out mathematically that the curvature of the log likelihood is the inverse of the variance of the associated normal random variable.

This is what guides the intuition surrounding fisher information, even though it will only hold approximately for non-normal variables (although, subject to some tehnical conditions, it will usually be asymptotically true.) Also, it serves as a good lower bound on the variance, see here.

Demonstration of the relationship between $I$ and $\sigma$ for gaussian likelihood for the mean

Let $f(X;\theta):= f(x;\mu,\sigma) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$

Take the logarithm of this:

$$\log(f) = -\log\sqrt{2\pi} - \log\sigma -\frac{(x-\mu)^2}{2\sigma^2}$$

We are looking at the likelihood for the mean ($\mu$) given a sample of data ($\mathbf{x}$) so we treat the above as a function of $\mu$. Given a sample $\mathbf{x}$, we can formulate the likelihood function for $\mu$.

$$L(\mu|\mathbf{x},\sigma) =-n\log\sqrt{2\pi} - n\log\sigma -\frac{1}{2\sigma^2}\sum\limits_{x_i \in \mathbf{x}}(x_i-\mu)^2$$

This function will be quadratic in $\mu$. Therefore, when we take $\frac{d^2L}{d\mu^2}$ we will get a constant value (due to it being quadratic). Specifically:

$$\frac{dL}{d\mu} =\frac{1}{\sigma^2}\sum\limits_{x_i \in \mathbf{x}}(x_i-\mu) \implies \frac{d^2L}{d\mu^2} = \frac{-n}{\sigma^2} = \textrm{constant} $$

Therefore,

$$-E_{\mu}\left(\frac{-n}{\sigma^2}\right) = \frac{n}{\sigma^2} = I(\mu)$$

Which is the Fisher Information about $\mu$, but:

$$ se(\hat \mu) = \frac{\sigma}{\sqrt{n}} \implies I(\mu) = \frac{1}{se(\mu)^2} = \frac{1}{\sigma^2_{\hat \mu}}$$

Therefore, the Fisher Information is the inverse of the variance of the MLE.

Related Question