Conditional Probability – Demonstrating and Interpreting Fisher Matrix and Dual Covariance Matrix

conditional probabilitycovariancecovariance-matrixdualityfisher information

I have a simple (maybe not) issue about the interpretation of the link between Fisher information matrix and its inverse which is the covariance matrix.

How to formulate that a line of Covariance matrix by Element-Wise Multiplication with the correponding column of Fisher matrix will give the value 1 (product of both matrices gives identity matrix).

In other words, how to prove that, starting from the Fisher information definition :

$$I(\theta)=-\mathbb{E}_{\mathbb{Y}}\left[\frac{d^{2}}{d \theta^{2}} \log p(Y \mid \theta)\right]$$

we have : $$\sum_{j=1}^{n} I_{ij}(\theta)\,\text{Cov}(\theta_{j},\theta_{i})=1$$

It looks like the result of a probability equal to 1 for example with a $\text{PDF}=f(\theta)$ :

$$P(-\infty < \theta < +\infty) = \int_{-\infty}^{+\infty} f(\theta) \text{d}\theta = 1$$

But I don't know how to better explicit this product wised-element between a row and a column which gives a value of 1.

If someone had a rigorous demonstration about the quantities involved (even if I know there are variance, covariance and Hessian terms of factors but I make confusions to demonstrate that we get a value equal to 1 for all diagonal elements, i.e getting an identity matrix).

Best Answer

The result that you seem to be referring to relates the inverse of the Fisher information matrix to the asymptotic covariance matrix of the maximum likelihood estimator (MLE). Note that this is an asymptotic result, i.e. it applies only in the limit of infinite sample size, it is not a general property. More specifically, under some regularity conditions (which don't always hold), the distribution of the MLE converges in the limit of infinite sample size to a multivariate normal distribution with covariance matrix $I(\theta)^{-1}$.

This implies that the MLE is asymptotically efficient, i.e. it reaches the lowest possible variance for any unbiased estimator (see Cramer Rao bound and Maximum likelihood Efficiency ).

You can find the details of the proof in most classic statistics textbook, (like this one), as well as many lecture notes (search "maximum likelihood asymptotic efficiency") .