Solved – For a Fisher Information matrix $I(\theta)$ of multiple variables, is it true that $I(\theta) = nI_1(\theta)$

fisher informationinferencemathematical-statistics

For a Fisher Information matrix $I(\theta)$ of multiple variables, is it true that $I(\theta) = nI_1(\theta)$? That is, if $\theta = (\theta_1, \ldots, \theta_k)$, will it be the case that the fisher information matrix of multiple parameters for an entire dataset will just be $n$ times the fisher information matrix for the first data point, assuming the data is iid?

Update: As a concrete example, consider a sequence of random variables $y_1, \ldots, y_n$ such that $y_i = \beta_0 + \beta_1 x_i + \epsilon_i$ where $\epsilon_i$ is assumed to be i.i.d. $N(0,\sigma^2)$, with $\sigma^2$ known. Additionally, assume that $n$ is even. I am trying to find the Fisher Information Matrix for $\beta = (\beta_1, \beta_2)$. I know that the log-likelihood for one observation is:

$$
l(\beta_0, \beta_1) \propto -\frac{1}{2\sigma^2}(y-\beta_0-\beta_1x)^2
$$

Hence, we have that the observed information matrix (before the expectation), is:

$\frac{\partial^2 l}{\partial \beta_0^2}= \frac{-1}{\sigma^2}$, $\frac{\partial^2 l}{\partial \beta_1^2}= \frac{-x^2}{\sigma^2}$, and $\frac{\partial^2 l}{\partial \beta_0 \partial \beta_1} = \frac{-x}{\sigma^2}$.

Thus the information matrix for a single observation is:
$$
-E\left(\frac{\partial^2 l}{\partial \beta^2}\right) =\frac{1}{\sigma^2}\left( \begin{array}{ccc}
1 & x \\
x & x^2 \end{array}\right)
$$

and the information matrix for n pairs of observations $(x_i, y_i)$ is given by:

$$
I(\beta_0, \beta_1) =\frac{1}{\sigma^2}\left( \begin{array}{ccc}
n & \sum_{i=1}^{n}x_i \\
\sum_{i=1}^{n}x_i & \sum_{i=1}^{n}x_i^2 \end{array}\right)
$$

Above, because the fisher information is additive, all we did to move from the single observation to the multiple observation case was just to add entry by entry. HOWEVER, I know that in general if $Y_1, \ldots, Y_n$ are iid, then $I(\theta) = nI_1(\theta)$. My question is, why is it NOT the case above we could have just multiplied each entry by $n$, and instead had to add?

Best Answer

Since the wikipedia article https://en.wikipedia.org/wiki/Fisher_information do not contain a proof, I will write one here. Let $X_1, X_2, \dotsc, X_n$ be independent random variables with density function $f(x;\theta)$ (which might in addition depend on known covariates, so this covers more than the iid case). Then the loglikelihood function is $$ \ell(\theta) = \sum_i \log f(X_i;\theta) $$ and the score function is $$ s(\theta) = \frac{\partial \ell(\theta)}{\partial \theta}= \sum_i\frac{\partial}{\partial\theta}\log f(X;\theta) $$ The Fisher information matrix then can be written $$ \DeclareMathOperator{\E}{\mathbb{E}} I(\theta) = \E\left[\sum_i \left( \frac{\partial}{\partial\theta}\log f(X_i;\theta) \right)\left( \frac{\partial}{\partial\theta}\log f(X_i;\theta) \right)^T \mid \theta\right] $$ and now the result follows by moving the summation sign outside the expectation operator, which shows that $I(\theta)=\sum_i I_i(\theta)$ where $I_i(\theta)$ is the Fisher information from variable $X_i$. In the iid case that becomes $I(\theta)=n I_1(\theta)$.