Correlation Analysis – How to Derive t-test Formula for Correlation Coefficient

correlation

The formula for the t-test on significance of the correlation coefficient is given as follows (e.g. https://newonlinecourses.science.psu.edu/stat501/node/259/):

$t^* = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}}$

How to derive this formula? It is different from the standard t-test formula given e.g. here: https://en.wikipedia.org/wiki/Student%27s_t-test, which is likely due to the fact that correlation coefficients are analyzed.

Best Answer

A t-test is a test on a statistic that has a t-distribution under the null hypothesis. A variable $Z$ has a t-distribution if it is obtained by dividing a Normally-distributed variable $X$ by a $\chi^2$-distributed variable $Y$. For the familiar t-test, $X$ is the sample mean of some IID data, which by the central limit theorem is Normally distributed, while $Y$ is the standard error of the mean, which has a $\chi^2$-distribution, and thus $X/Y$ follows a t-distribution.

For correlation coefficients, under the null-hypothesis that the population correlation coefficient equals 0, the sample correlation is approximately Normally distributed with standard error $SE(r)=\sqrt{\frac{1-r^2}{n-2}}$, and the standard error is again $\chi^2$-distributed. Thus, the t-statistic is obtained by dividing the sample correlation coefficient $r$ by this standard error: $$ t=\frac{r}{SE(r)}=\frac{r}{\sqrt{\frac{1-r^2}{n-2}}} =\frac{r\sqrt{n-2}}{\sqrt{1-r^2}} $$ Note that in both cases we get the t-statistic by dividing a Normally-distributed variable by its $\chi^2$-distributed standard error, and so they're actually really not that different.