Sample Pearson correlation coefficient

analysisprobabilityprobability-limit-theoremsstatistical-inferencestatistics

Given paired data $\left\{(x_{1},y_{1}),\ldots ,(x_{n},y_{n})\right\}$ consisting of $n$ iid pairs ($x_i$ and $y_i$ are indenpendent), $r_{xy}$ is defined as:

$$ r_{xy}={\frac {\sum _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{{\sqrt {\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}{\sqrt {\sum _{i=1}^{n}(y_{i}-{\bar {y}})^{2}}}}}$$

where $Ex_1=\mu_1$, $Ey_1=\mu_2$, $Var [x_1]=\sigma_1^2$, $Var [y_1]=\sigma_2^2$.

What is the limiting distribution of $\frac{\sqrt{n} \, r_{xy}}{\sqrt{1-r_{xy}^2}}$.

Best Answer

Here is an outline for how you can approach this.

  1. Based on your updated question, proceed directly to step 2

First, as you seem to have noted, it suffices to find the asymptotic distribution of your expression with $\sqrt n$ in place of $\sqrt{n-2}$ since

$$\sqrt{n-2}=\left(\frac{\sqrt{n-2}}{\sqrt n}\right)\sqrt n$$

and the first term converges in probability to 1.

  1. I also suggest writing the sample correlation as

$$ r_{xy}={\frac{\frac{1}{n}\sum _{i=1}^{n}x_{i}y_{i}-(\frac{1}{n}\sum _{i=1}^{n}x_{i})(\frac{1}{n}\sum _{i=1}^{n}y_{i})}{{\sqrt {\left(\frac{1}{n}\sum _{i=1}^{n}x_{i}^{2} -(\frac{1}{n}\sum _{i=1}^{n}x_{i})^2\right)\left(\frac{1}{n}\sum _{i=1}^{n}y_{i}^{2} -(\frac{1}{n}\sum _{i=1}^{n}y_{i})^2\right)}}}}.$$

We'll call the population correlation coefficient $r$ (this is the same as the above expression, just replacing sample averages with expectations).

  1. Now to the meat of the problem:

Define $$w_i\equiv (x_i,y_i,x_i^2,y_i^2,x_iy_i)',\\ \theta\equiv E[w_i],\Sigma\equiv \text{Var}(w_i).$$

Then if $(x_i,y_i)$ are iid and $x_i,y_i$ have finite fourth moments, CLT tells us

$$\sqrt n \left(\frac{1}{n}\sum_{i=1}^n w_i-\theta\right)\to_d N(0,\Sigma).$$

Note that you must know fourth order moments to know $\Sigma$; if you don't have explicit parameters for these, I would just leave "$\Sigma$" as is.

  1. From here, I would use the delta method to find the asymptotic distribution of

$$\sqrt n \left(\underbrace{g\left(\frac{1}{n}\sum_{i=1}^n w_i\right)}_{=r_{xy}/\sqrt{1-r_{xy}^2}}-\underbrace{g(\theta)}_{=r/\sqrt{1-r^2}}\right)$$

where $g:\mathbb{R}^5\to \mathbb{R}$ is defined as

$$g\equiv f\circ h\\ h:\mathbb{R}^5\to\mathbb{R},\quad h(a,b,c,d,e)\equiv \frac{e-ab}{\sqrt{(c-a^2)(d-b^2)}}\\ f:\mathbb{R}\to\mathbb{R},\quad f(a)\equiv \frac{a}{\sqrt{1-a^2}}.$$

Related Question