Pearson Correlation Coefficient – What is the Uncertainty on the Pearson Correlation Coefficient?

data analysisst.statistics

Do you know what is the uncertainty on the Pearson correlation coefficient as a function of the uncertainty on the measurement in the data set.

I know of an expression giving the uncertainty related to the limited size of the data set, but I'm looking for the uncertainty related to the measurement of the data itself, which I think must dominate.

Many thanks!

Best Answer

Suppose that the variables $X,X'$ are fixed with means $\mu, \mu'$, standard deviations $\sigma, \sigma'$, and correlation $\rho$. Suppose they are observed with errors $Z,Z'$ that are normally distributed with mean 0 and standard deviations $\epsilon, \epsilon'$. Then to a first approximation:

\begin{align} Var[\text{observed }\rho] &=Var\left[\frac{n \sum (X+Z)(X'+Z') - \sum(X+Z)\sum(X'+Z')}{n^2 SD[X+Z] SD[X+Z']}\right] \\ \\ &\simeq Var\left[\frac{n \sum (X+Z)(X'+Z') - n^2 \mu \mu'}{n^2 \sqrt{(\sigma^2+\epsilon^2)(\sigma'^2+\epsilon'^2)}}\right]\\ \\ &=\frac{Var\left[\Sigma (X+Z)(X'+Z')\right]}{n^2 (\sigma^2+\epsilon^2)(\sigma'^2+\epsilon'^2)}\\ \\ &=\frac{E\left[\big(\Sigma (XX'+XZ'+ZX'+ZZ')\big)^2\right]-n^2\mu^2 \mu'^2}{n^2 (\sigma^2+\epsilon^2)(\sigma'^2+\epsilon'^2)}\\ \\ &=\frac{E\left[\Sigma (X^2X'^2+X^2Z'^2+Z^2X'^2+Z^2Z'^2)\right]-n^2\mu^2 \mu'^2}{n^2 (\sigma^2+\epsilon^2)(\sigma'^2+\epsilon'^2)}\\ \\ &=\frac{(\mu^2+\sigma^2+\epsilon^2)(\mu'^2+\sigma'^2+\epsilon'^2)+2\rho\sigma\sigma'(2\mu\mu'+\rho\sigma\sigma')-\mu^2 \mu'^2}{(\sigma^2+\epsilon^2)(\sigma'^2+\epsilon'^2)}\\ \end{align}

Related Question