Solved – How to interpret a moderately negative correlation

correlationspearman-rho

I've conducted a Spearman's rank-order correlation test on a set of data. The Score which a participant got on a test is the independent variable, and their Responses to a Likert-type question are the dependent variable. I wanted to see if there was a main effect between the Score, and Responses to any of the questions. There were 19 questions. I only found one correlation, between the Score and Question 5, and it looked like this:

r = -.408, p = .003

So it's showing a moderate negative correlation, implying that as participants' score in the test goes up, their responses to Question 5 go down. As Question 5 is Likert-type, with 5 being the best and 1 being the worst, this means that that participants who scored higher tend to rate Question 5 worse.

This bit is fine. What I don't know how to interpret is the moderate correlation. Because of the low p-value, I guess that means that the test is reliable, but what does it mean to have a reliable moderate correlation? Perhaps I'm overthinking this, but I just don't know how to make sense of this result.

Best Answer

A couple of caveats before moving on to the actual question you asked -

First, with 19 tests (really, more than one test) you should be adjusting for multiple comparisons. If you perform 20 independent tests of null hypotheses that are in fact true, you'd expect to get, on average, one p-value $\leq 0.05$, with the distinct possibility of getting more... which implies that your overall probability of rejecting a true null hypothesis is actually quite high.

One well-respected procedure you can use is the Benjamini-Hochberg procedure. You would need to choose a false discovery rate (FDR), that is, a desired maximum expected proportion of "discoveries" (i.e., rejections of the null hypothesis) that are false (i.e., the null hypothesis in those cases is actually true.) In your case, the criterion for your single "significant" correlation becomes $FDR/19$, which equals 0.0026 if you set the FDR at 0.05. You would conclude that you could not reject any of the null hypotheses at an FDR of 0.05.

Having written that, let's assume you set the FDR at 0.1 and consequently did reject the null hypothesis for the correlation above. Now, why is that correlation relatively large (in absolute value)? When you're comparing several (in this case) correlations, the largest ones probably got that way through some combination of underlying value and randomness, writing very loosely. This brings us to the second caveat - you can't trust that the largest effect size is an unbiased estimate of its true effect. There are ways of correcting for this, too, but I won't go into them here - they are more complex than FDR, and, depending on your data, may not make much difference.

On to the question! What does moderate correlation mean? I suspect you are overthinking the issue to some extent. It simply means that there is some relationship between the two variables in question, but that there's also a lot of randomness affecting one or both variables, or perhaps other variables affect the two variables in question, so the direct relationship isn't strong, but it's certainly noticeable. Plots help:

enter image description here

These two variables have a correlation of -0.44, about the same as yours. You can see there is a relationship, but it's certainly not a strong one, nor is it so weak it can be ignored. Hence, "moderate".