Fisher Information – Calculating Fisher Information for ? in Bivariate Normal Distribution

asymptoticsinferencemathematical-statisticsmaximum likelihoodnormal distribution

I have seen many times people using the Delta method in order to find the asymptotic distribution of $r$, the sample correlation coefficient, for bivariate normal data. This distribution is given by

$$\sqrt{n} \left( r-\rho \right) \xrightarrow{D} \mathcal{N} \left(0, \left(1-\rho^2\right)^2 \right)$$

and this is a well known result (I know of the z-transform but it's not necessary in this context). I understand the method but what I have been wondering is, why they do not do something simpler. By the invariance of the mles of the sample means and variances, it is easy to show that the sample correlation coefficient is in fact the mle for $\rho$. Now as this is a mle, under the regularity conditions, it should follow the asymptotic distribution of the mle, namely

$$\sqrt{n} \left(r – \rho \right)\xrightarrow{D} \mathcal{N} \left(0, I^{-1} (\rho) \right)$$

where $I(\rho)$ is the Fisher information for $\rho$. All that remains now is to find $I(\rho)$. Differentiating twice the log of the bivariate normal distribution with respect to $\rho$ and taking the negative expectation, I believe one arrives at

$$I(\rho) = \frac{1+\rho^2}{\left(1-\rho^2\right)^2}$$

which, assuming I have not made a mistake in the lengthy computation, is very different than the above asymptotic variance, at least for not so small $\rho$. I have even run a few simulations that show the delta method to be far superior in most cases. The smaller asymptotic variance is in line with what one would expect from the mle, however it turns out to be a very bad approximation.

It's not impossible that I have made a mistake somewhere, although I have checked again and again. If that is not the case then, is there a conceptual mistake in the above reasoning? I have looked at some famous books on inference and nowhere do they mention the Fisher Information for $\rho$, which I also find quite puzzling.

I would appreciate any insight. Thank you.

Best Answer

The OP clarified in a comment that he examines the standard bivariate normal distribution, with means and variances fixed to zero and unity correspondingly,

$$f(x,y) = \frac{1}{2 \pi \sqrt{1-\rho^2}} \exp\left\{-\frac{x^2 +y^2 -2\rho xy}{2(1-\rho^2)}\right\} $$

In turn, this makes the distribution a member of the curved exponential family, and, as I have shown in my answer to this post the maximum likelihood estimator for $\rho$ in such a case does not equal the sample correlation coefficient. Specifically the sample correlation coefficient is

$$\tilde r = \frac 1n\sum_{i=1}^nx_iy_i$$

Denoting $\hat \rho$ the mle for $\rho$ and $(1/n)\sum_{i=1}^n(x_i^2 +y_i^2) = (1/n)S_2$, to be the sum of the sample variances of $X$ and $Y$, we obtain

$$\hat \rho: \hat \rho^3 -\tilde r \hat \rho^2 + \big[(1/n)S_2-1\big]\hat \rho -\tilde r=0$$

$$\Rightarrow \hat \rho\Big(\hat \rho^2 -\tilde r \hat \rho + \big[(1/n)S_2-1\big] \Big) = \tilde r$$

Doing the algebra, it is not difficult to conclude that we will obtain $\hat \rho = \tilde r$ if,and only if, $(1/n)S_2 =2$, i.e. only if it so happens that the sum of sample variances equals the sum of true variances. So in general for finite samples,

$$\hat \rho \neq \tilde r$$

Both remain consistent, but this alone does not imply that the asymptotic distribution of the sample correlation coefficient will attain the Cramer-Rao bound, which is the one found by the OP. And it doesn't.

Related Question