Solved – Two-tailed p-value for Pearson’s $r$

correlationhypothesis testingp-valuepythonstatistical significance

The SciPy implementation of Pearson's $r$ also gives a two-tailed $p$-value.

I understand that a $p$-value for a given correlation gives the probability of a correlation coefficient at least as big to be observed if the null hypothesis is true.

I find it hard to understand how this test can be two-tailed, however. What would be the meaning of a one-tailed $p$-value for non-correlation, then? Since $r$ is signed I think only a one-tailed p-value could satisfy the definition given above.

Best Answer

Ok, so this is actually quite easy:

A 2-tailed p-value gives the probability of a correlation at least as extreme as r to be observed if the true correlation is in fact zero.
A 1-tailed p-value gives the probability of a correlation at least as extreme as r to be observed if the true correlation is in fact zero, or of the opposite sign than r.

Related Solutions

One-Tailed Hypothesis Testing – Justification of One-Tailed Hypothesis Testing Explained

That's a thoughtful question. Many texts (perhaps for pedagogical reasons) paper over this issue. What's really going on is that $H_0$ is a composite "hypothesis" in your one-sided situation: it's actually a set of hypotheses, not a single one. It is necessary that for every possible hypothesis in $H_0$, the chance of the test statistic falling in the critical region must be less than or equal to the test size. Moreover, if the test is actually to achieve its nominal size (which is a good thing for achieving high power), then the supremum of these chances (taken over all the null hypotheses) should equal the nominal size. In practice, for simple one-parameter tests of location involving certain "nice" families of distributions, this supremum is attained for the hypothesis with parameter $\theta_0$. Thus, as a practical matter, all computation focuses on this one distribution. But we mustn't forget about the rest of the set $H_0$: that is a crucial distinction between two-sided and one-sided tests (and between "simple" and "composite" tests in general).

This subtly influences the interpretation of results of one-sided tests. When the null is rejected, we can say the evidence points against the true state of nature being any of the distributions in $H_0$. When the null is not rejected, we can only say there exists a distribution in $H_0$ which is "consistent" with the observed data. We are not saying that all distributions in $H_0$ are consistent with the data: far from it! Many of them may yield extremely low likelihoods.

Solved – the deal with $p$-value when generating Pearson’s $r$ correlation coefficient

[Fixed/improved, based on the feedback from @Momo and @whuber]

I believe that in the context of regression the relationship between $p$-value and Pearson's correlation coefficient is the following: $p$-value can be interpreted as probability that correlation (coefficient), determined in a random sampling-based experiment, is the same or larger than the one, determined from the observed data, provided that the null hypothesis is true. In other words, I think that $p$-value in this context is related to hypothesis testing, where hypotheses themselves are correlation-based, as follows:

\begin{multline} \shoveleft{H_0: \text{correlation (of the underlying data-generation process) is zero;}}\\ \shoveleft{H_A: \text{the correlation is not zero.}} \end{multline}

Then, the situation IMHO boils down to the following traditional hypothesis testing interpretation. If $p$-value is small (less than arbitrarily selected significance level $\alpha$, usually equal to 0.05), then you can reject the null hypothesis ("determined correlation is statistically significant"), and, if $p$-value is greater than $\alpha$, than you fail to reject the null ("the correlation is not statistically significant").

In regard to a relationship between $p$-value and sample size $N$, the following formulae present the relationship in question in a mathematical form.

Fisher transformed test statistic of $r$ (aka $z$) is defined as $T(r) = artanh(r)$.

For a bivariate normal distribution, $z$'s standard error depends on sample size $N$, as follows:

\begin{align} SE(T(r)) \approx \frac{1}{\sqrt{N - 3}} \end{align}

Moreover, since the test statistic is approximately normal,

\begin{align} \frac{T(r)}{SE(T(r))} \approx N(0,1) \text{ and } \lim_{N\to\infty} SE(T(r)) = 0 \end{align}

so the standard error in the denominator is getting increasingly smaller for increasingly larger $N$.

P.S. You may also find the following two answers relevant and useful: this and this.

Best Answer

Related Solutions

One-Tailed Hypothesis Testing – Justification of One-Tailed Hypothesis Testing Explained

Solved – the deal with $p$-value when generating Pearson’s $r$ correlation coefficient

Related Question