Solved – How to interpret a moderately negative correlation

correlationspearman-rho

I've conducted a Spearman's rank-order correlation test on a set of data. The Score which a participant got on a test is the independent variable, and their Responses to a Likert-type question are the dependent variable. I wanted to see if there was a main effect between the Score, and Responses to any of the questions. There were 19 questions. I only found one correlation, between the Score and Question 5, and it looked like this:

r = -.408, p = .003

So it's showing a moderate negative correlation, implying that as participants' score in the test goes up, their responses to Question 5 go down. As Question 5 is Likert-type, with 5 being the best and 1 being the worst, this means that that participants who scored higher tend to rate Question 5 worse.

This bit is fine. What I don't know how to interpret is the moderate correlation. Because of the low p-value, I guess that means that the test is reliable, but what does it mean to have a reliable moderate correlation? Perhaps I'm overthinking this, but I just don't know how to make sense of this result.

Best Answer

A couple of caveats before moving on to the actual question you asked -

First, with 19 tests (really, more than one test) you should be adjusting for multiple comparisons. If you perform 20 independent tests of null hypotheses that are in fact true, you'd expect to get, on average, one p-value $\leq 0.05$, with the distinct possibility of getting more... which implies that your overall probability of rejecting a true null hypothesis is actually quite high.

One well-respected procedure you can use is the Benjamini-Hochberg procedure. You would need to choose a false discovery rate (FDR), that is, a desired maximum expected proportion of "discoveries" (i.e., rejections of the null hypothesis) that are false (i.e., the null hypothesis in those cases is actually true.) In your case, the criterion for your single "significant" correlation becomes $FDR/19$, which equals 0.0026 if you set the FDR at 0.05. You would conclude that you could not reject any of the null hypotheses at an FDR of 0.05.

Having written that, let's assume you set the FDR at 0.1 and consequently did reject the null hypothesis for the correlation above. Now, why is that correlation relatively large (in absolute value)? When you're comparing several (in this case) correlations, the largest ones probably got that way through some combination of underlying value and randomness, writing very loosely. This brings us to the second caveat - you can't trust that the largest effect size is an unbiased estimate of its true effect. There are ways of correcting for this, too, but I won't go into them here - they are more complex than FDR, and, depending on your data, may not make much difference.

On to the question! What does moderate correlation mean? I suspect you are overthinking the issue to some extent. It simply means that there is some relationship between the two variables in question, but that there's also a lot of randomness affecting one or both variables, or perhaps other variables affect the two variables in question, so the direct relationship isn't strong, but it's certainly noticeable. Plots help:

These two variables have a correlation of -0.44, about the same as yours. You can see there is a relationship, but it's certainly not a strong one, nor is it so weak it can be ignored. Hence, "moderate".

Related Solutions

Solved – About correlation of ordinal variables having different number of categories and about correlation of mixed type of variables

1.) I think nonparametric correlation methods Spearman's or Kendall's can be used. 2.) Reversing the order in the code only changes the sign of the correlation not the magnitude. So changing order is not necessary. 3.) The nonparametric methods require that the data be ordered. So they can be applied when one variable is ordinal and the other is interval scale.

Solved – How to do a correlation between Likert scale and an ordinal categorical measure

What about one of the Kendall's $\tau$s? They are a kind of rank correlation coefficient for ordinal data.

Here's an example with Stata and $\tau_{b}$. A value of $−1$ implies perfect negative association, and $+1$ indicates perfect agreement. Zero indicates the absence of association. Here we see a modest, though significant, negative association between speed limits and accidents.

. webuse hiway, clear
(Minnesota Highway Data, 1973)

. tab spdlimit rate, taub

           |    Accident rate per million
     Speed |          vehicle miles
     Limit |   Below 4        4-7    Above 7 |     Total
-----------+---------------------------------+----------
        40 |         1          0          0 |         1 
        45 |         1          1          1 |         3 
        50 |         1          4          2 |         7 
        55 |        10          4          1 |        15 
        60 |         9          2          0 |        11 
        65 |         1          0          0 |         1 
        70 |         1          0          0 |         1 
-----------+---------------------------------+----------
     Total |        24         11          4 |        39 

          Kendall's tau-b =  -0.4026  ASE = 0.116

You can also try an asymmetric modification of $\tau_{b}$ that only corrects for ties of the independent variable. This is called Somer's D:

. somersd rate spdlimit
Somers' D with variable: rate
Transformation: Untransformed
Valid observations: 39

Symmetric 95% CI
------------------------------------------------------------------------------
             |              Jackknife
        rate |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    spdlimit |  -.4727723   .1395719    -3.39   0.001    -.7463282   -.1992163
------------------------------------------------------------------------------

All these measure of association are related in that they classify all pairs of observations (highways in our example) as concordant or discordant. A pair is concordant if the observation with the larger value of variable $X$ (speed limit) also has the larger value of variable $Y$ (accident rate). There are more of them than you can shake a stick at (one more is Goodman and Kruskal's $\gamma$, which ignores ties altogether like $\tau_{a}$). They will generally yield similar conclusions, even if they are not directly comparable.

The results above are qualitatively in line with Spearman's rank correlation coefficient mentioned by Greg (which tends to be larger in absolute value than $\tau$):

.ci2 rate spdlimit, spearman

Confidence interval for Spearman's rank correlation 
of rate and spdlimit, based on Fisher's transformation.
Correlation = -0.451 on 39 observations (95% CI: -0.671 to -0.158)

This measure does not consider pairs, but compares the similarity of the ordering that you would get if you used each variable separately to rank observations (Stata breaks ties by assigning the average rank, and it's just Pearson correlation on the ranks). This makes it somewhat faster to compute since you don't have to consider all $\frac{n \cdot (n-1)}{2}$ pairs. On the other hand, the central limit theorem works much faster for $\tau$, so if you plan to do inference that measure might be better. $\tau_b$ is the most common variant.

Best Answer

Related Solutions

Solved – About correlation of ordinal variables having different number of categories and about correlation of mixed type of variables

Solved – How to do a correlation between Likert scale and an ordinal categorical measure

Related Question