Solved – Threshold for correlation coefficient to indicate statistical significance of a correlation in a correlation matrix

correlationmultiple-comparisonsstatistical significance

I have computed a correlation matrix of a data set which contains 455 data points, each data point containing 14 characteristics. So the dimension of the correlation matrix is 14 x 14.

I was wondering whether there is a threshold for the value of the correlation coefficient which points out that there is a significant correlation between two of those characteristics.

I have value ranging from -0.2 to 0.85, and I was thinking that the important ones are those which are above 0.7.

  • Is there a general value for the correlation coefficient which should be considered for the threshold or is just context dependent to the data type which I am investigating?

Best Answer

Significance tests for correlations

There are tests of statistical significance that can be applied to individual correlations, which indicate the probability of obtaining a correlation as large or larger than the the sample correlation assuming the null hypothesis is true.

The key point is that what constitutes a statistically significant correlation coefficient depends on:

  • Sample size: bigger sample sizes will lead to smaller thresholds
  • alpha: often set to .05, smaller alphas will lead to higher thresholds for statistical significance
  • one-tailed / two-tailed test: I'm guessing that you would be using two-tailed so this probably doesn't matter
  • type of correlation coefficient: I'm guessing you are using Pearson's
  • distributional assumptions of x and y

In common circumstances, where alpha is .05, using two-tailed test, with Pearson's correlation, and where normality is at least an adequate approximation, the main factor influencing the cut-off is sample size.

Threshold of importance

Another way of interpreting your question is to consider that you are interested not in whether a correlation is statistically significant, but rather whether it is practically important.

Some researchers have offered rules of thumb for interpreting the meaning of correlation coefficients, but these rules of thumb are domain specific.

Multiple significance testing

However, because you are interested in flagging significant correlations in a matrix, this changes the inferential context. You have $k(k-1)/2$ correlations where $k$ is the number of variables (i.e., $14(13)/2=91$. If the null hypothesis were true for all correlations in the matrix, then the more significance tests you run, then the more likely you are of making a Type I error. E.g., in your case you would on average make $91 * .05 = 4.55$ Type I errors if the null hypothesis were true for all correlations.

As @user603 has pointed out, these issues were well discussed in this earlier question.

In general, I find it useful when interpreting a correlation matrix to focus on higher level structure. This can be done in an informal way by looking at general patterns in the correlation matrix. This can be done more formally by using techniques like PCA and factor analysis. Such approaches avoid many of the issues associated with multiple significance testing.

Related Question