Solved – Spearman correlation signficant but scatter plot not

correlationdata visualizationscatterplotspearman-rho

For my data (based on 5-item numeric likert scales), I have tried to calculate correlations. As the data is not normally distributed (according to a Shapiro-Wilk test), I have used Spearman correlations.

For most of my variables, Spearman correlation results and scatter plots agree – results are not significant and I can't see any relationship in the scatter plots – unfortunate for my hypotheses, but at least not confusing. However, for one case, the scatter plot looks like there is no relationship, but the Spearman correlation is significant (and negative which is also very much against my hypothesis and previous literature).

So now I'm just really confused on:

  1. whether I've done it right, and
  2. which result is right?

EDIT: I have now added jitter to the plot, please see the attached picture below. However, I am still not quite sure on what the right answer is – is the correlation right or the scatter plot (or can I just not see any relationship in the plot, but it exists)?

EDIT 2: I have now ranked the data first and then added jitter to the new plot. Looks very similar to the non-ranked one though, so not sure if I did this right! As I can only have two pictures in here, please let me know if you need the correlations back and I will gladly add them instead

Scatter Plot with Jitter and Ranks

Scatter Plot with Jitter

Best Answer

There are two points to bear in mind here:

  1. The Spearman correlation is based on ranks, not the original data.
  2. You have a lot of data plotted on top of each other such that you have lots of mass at certain points, but you can't see that in your plot.

The rank issue is probably not as big a deal here, although you might as well convert your data to ranks and plot those. Then, you need to jitter your ranks slightly and make them semi-transparent to better see what's going on. I don't have access to your data, but you can get the idea from my answer here: How to extract information from a scatterplot matrix when you have large N, discrete data, & many variables?