Solved – Which correlation should be used for non-normal data: Spearman’s rho versus Kendall’s tau versus Kendal’s tau-b

correlationr

I'm trying to see if there is a correlation between the height of grass and the height under branches available for grass to grow. I have 227 paired observations:

GrassHeight HeightUnderDebris
0            0
0            0
0            0
8            16
0            0
0            0
0            0
2            2
6            6
0            0
0            0
1            1
0            0
0            0
0            0
8            15
0            0
7            7
15           15

My data is not normally distributed and it fails at the assumption of bivariate normality:

result<-hzTest(data,cov = TRUE,qqplot = FALSE)
result<-mardiaTest(data,cov = TRUE,qqplot = FALSE)
result<-roystonTest(data,qqplot = FALSE)

Therefore, I need to use a Spearman's rho or Kendall's tau. Firstly, Spearman's rho results in a warning message:

cor.test(GrassHeight, HeightUnderDebris, method="spearman")

Spearman's rank correlation rho

data:  GrassHeight and HeightUnderDebris
S = 123090, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho 
0.9368622 

Warning message:
In cor.test.default(GrassHeight, HeightUnderDebris, method = "spearman") :
Cannot compute exact p-value with ties

So I then decided to use Kendall's tau as it can deal with ties:

cor.test(GrassHeight, HeightUnderDebris, method="kendall")

Kendall's rank correlation tau

data:  GrassHeight and HeightUnderDebris
z = 17.202, p-value < 2.2e-16
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau 
0.858494

Firstly, should I be concerned that my data has many zeros? They are important as they reflect that if there is no space under branches, then there is no space for grass growth hence why the grass height is 0.

Secondly, how would you interpret Kendall's results? Is it right that the two variables are uncorrelated at 0.05 significance level if their correlation coefficient is zero? In this case, tau is 0.858. That is not zero and will be rounded up to 1. Can I say that the two variables are correlated based on this?

Should I rather look at rpudplus and the function rpucor, which now uses Kendall’s tau-b to compute the correlation coefficient?

What post-hoc test can I do to find out the nature of the correlation, i.e: as the height between the ground and branch increases, grass height increases?

Best Answer

Pearson's correlation doesn't assume normality, so you should use it. You really don't need Kendall's tau in your example.

In your analysis, you should start off with a simple plot. Like this:

grass  <- c(0,0,0,8,0,0,0,2,6,0,0,1,0,0,0,8,0,7,15)
height <- c(0,0,0,16,0,0,0,2,6,0,0,1,0,0,0,15,0,7,15)
plot(grass, height)

enter image description here

This is clearly linear and monotonic positive, and so it was pointless for you to do a normality test.

Both Pearson and Spearman give similar results:

cor(grass, height) # 0.9300721
cor(grass, height, method='spearman') # 0.9947245

In this example, it's not really not that important to do a correlation test because you know the p-value will be very small and your result will be significant. But let's do it anyway:

cor(grass, height) # 8.227e-09
cor.test(grass, height, method='spearman', exact=FALSE) # 2.2e-16

In the first line, we did a test for Pearson's correlation. Your very small p-value give you confidence that your correlation is not zero (not very surprising).

In the second line, we have a test for Spearman's correlation. Note that your zeros made the default exact test impossible, so you'll need to set exact to FALSE to approximate your test statistic with t approximation. Again, this is very significant.