Solved – Interpreting a scatter plot of ranks with a particular structure

correlationscatterplotspearman-rho

I am conducting correlation analysis on variables that I cannot assume to be normally distributed, so I am using Spearman's rank correlation instead of Pearson's. The two variables do not seem to be correlated (r = 0.05), yet I am seeing a quite particular structure in the scatterplot of ranks: the scatterplot of ranks

As you can see, around the middle of the ranks (500; these are 1000 data points), there seems to be a positively skewed cloud, which is surrounded by a square (or at least four corner-patches) of points.

Is anyone familiar with such a structure? Does this simply mean I should reject outliers on either one of the axes?

Best Answer

If you look at your scatterplot again, you will see that this is not so much corners of a square as it is a kind of an X pattern. As Maartin Buis notes in the comments, this plot looks like it has two groups of data points, one of which has a strong positive relationship between the variables, and one of which has a slightly less strong negative relationship between the variables. If these groups are of roughly the same sizes (which they appear to be) then these correlations cancel out in the aggregate and so you are getting a result that shows low sample correlation.

In view of this, I would suggest you look for a third binary variable that might account for the apparent groups in this plot. You could try plotting your data again, but colouring them by any available binary variables, and see if this demarcates two groups with positive and negative correlation. If this is not available in your data then it might just be a case where you have a lurking variable that is causing mischief.