Significant Correlation – What Does It Mean?

correlationp-valuestatistical significance

I performed the correlation between several variables. R makes it possible to separate significant from non-significant correlations.
I noticed that low corrections (corr=0.4) can be considered significant (p.value<0.05), while non-significant corrections (p.value>0.05) can assume relatively high values (corr=0.7 ).

In this sense, I would like to know what exactly it means to say that one correlation is more significant than another? what arguments could I use to explain the situation above?

Best Answer

It is usually a test indicating whether one can infer that the "true" (population) correlation is non-zero. $$ \begin{align} H_0&: \textrm{The two variables are uncorrelated. } &(r = 0) \\ H_a&: \textrm{The two variables are correlated. } &(r \ne 0) \\ \end{align} $$

One generally only has access to a sample of values from the two variables of interest. You might imagine that it's easy to infer a strong correlation between two variables from a small sample, but more data is required to determine whether an apparent relationship is a weak correlation or just noise. The formula for the test statistic backs up this intuition: it's a function of the sample size ($n$) and the sample correlation ($r$). One way to test this is via the t distribution. You compute:

$$t^* \approx\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}$$

then use the $t_{n-2}$ distribution to convert this into a $p$-value, which tells you the probability of seeing a correlation at least this large in your sample if the population correlation is zero. Other approaches use a slightly different "exact" formula, which is again only a function of $r$ and $n$ and can be interpreted in the same way.

Bear in mind that this really tells you what you can claim, based on a sample: a large $p$-value does not necessarily mean that the correlation is precisely zero, just that you can't say whether it is/isn't given your data.

This is what Matlab's corr, SciPy's scipy.stats.mstats.pearsonr, and R's cor.test by default. There are, of course, other tests one can run on correlations (e.g., to compare two correlations), so check to make sure.

Related Question