Solved – Which correlation should be used for non-normal data: Spearman’s rho versus Kendall’s tau versus Kendal’s tau-b

correlationr

I'm trying to see if there is a correlation between the height of grass and the height under branches available for grass to grow. I have 227 paired observations:

GrassHeight HeightUnderDebris
0            0
0            0
0            0
8            16
0            0
0            0
0            0
2            2
6            6
0            0
0            0
1            1
0            0
0            0
0            0
8            15
0            0
7            7
15           15

My data is not normally distributed and it fails at the assumption of bivariate normality:

result<-hzTest(data,cov = TRUE,qqplot = FALSE)
result<-mardiaTest(data,cov = TRUE,qqplot = FALSE)
result<-roystonTest(data,qqplot = FALSE)

Therefore, I need to use a Spearman's rho or Kendall's tau. Firstly, Spearman's rho results in a warning message:

cor.test(GrassHeight, HeightUnderDebris, method="spearman")

Spearman's rank correlation rho

data:  GrassHeight and HeightUnderDebris
S = 123090, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho 
0.9368622 

Warning message:
In cor.test.default(GrassHeight, HeightUnderDebris, method = "spearman") :
Cannot compute exact p-value with ties

So I then decided to use Kendall's tau as it can deal with ties:

cor.test(GrassHeight, HeightUnderDebris, method="kendall")

Kendall's rank correlation tau

data:  GrassHeight and HeightUnderDebris
z = 17.202, p-value < 2.2e-16
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau 
0.858494

Firstly, should I be concerned that my data has many zeros? They are important as they reflect that if there is no space under branches, then there is no space for grass growth hence why the grass height is 0.

Secondly, how would you interpret Kendall's results? Is it right that the two variables are uncorrelated at 0.05 significance level if their correlation coefficient is zero? In this case, tau is 0.858. That is not zero and will be rounded up to 1. Can I say that the two variables are correlated based on this?

Should I rather look at rpudplus and the function rpucor, which now uses Kendall’s tau-b to compute the correlation coefficient?

What post-hoc test can I do to find out the nature of the correlation, i.e: as the height between the ground and branch increases, grass height increases?

Best Answer

Pearson's correlation doesn't assume normality, so you should use it. You really don't need Kendall's tau in your example.

In your analysis, you should start off with a simple plot. Like this:

grass  <- c(0,0,0,8,0,0,0,2,6,0,0,1,0,0,0,8,0,7,15)
height <- c(0,0,0,16,0,0,0,2,6,0,0,1,0,0,0,15,0,7,15)
plot(grass, height)

This is clearly linear and monotonic positive, and so it was pointless for you to do a normality test.

Both Pearson and Spearman give similar results:

cor(grass, height) # 0.9300721
cor(grass, height, method='spearman') # 0.9947245

In this example, it's not really not that important to do a correlation test because you know the p-value will be very small and your result will be significant. But let's do it anyway:

cor(grass, height) # 8.227e-09
cor.test(grass, height, method='spearman', exact=FALSE) # 2.2e-16

In the first line, we did a test for Pearson's correlation. Your very small p-value give you confidence that your correlation is not zero (not very surprising).

In the second line, we have a test for Spearman's correlation. Note that your zeros made the default exact test impossible, so you'll need to set exact to FALSE to approximate your test statistic with t approximation. Again, this is very significant.

Related Solutions

R – Identifying Issues with Spearman Correlation in Presence of Many Ties

Use a permutation test. You only need to permute one of the variables independently of the other; here, the response is permuted. Because the relationship in the example is strong, only a small number of permutations are needed (1000 in the example below).

As always, the actual statistic is compared to the distribution of permuted statistics. The p-value is the estimate of the tail probability of the permutation distribution relative to the actual statistic. In some cases the test statistic has a discrete distribution, so it's wise to check the frequencies with which (a) the permutation statistics strictly exceed the actual statistic and (b) the permutation statistics equal or exceed the actual statistic. The code illustrates this by splitting the difference.

test <- function(y) suppressWarnings(cor.test(x, y, method="spearman")$estimate)
rho <- test(y)                                     # Test statistic
p <- replicate(10^3, test(sample(y, length(y))))   # Simulated permutation distribution

p.out <- sum(abs(p) > rho)    # Count of strict (absolute) exceedances
p.at <- sum(abs(p) == rho)    # Count of equalities, if any
(p.out + p.at /2) / length(p) # Proportion of exceedances: the p-value.

suppressWarnings quiets any complaints from cor.test that it cannot compute a p-value due to ties.

Solved – Comparing time series: Pearson correlation, Kendall’s tau b or Spearman’s rho

For time series some version of Pearson correlation is most used, in the form of the autocorrelation function (for one series, correlated with itself at various lags) and the cross-correlation function (for two series) likewise. They are correct when all conditional expectation are linear.

If you suspect that may not be the case, you should start with some visualization of the two series! I have not seen any detailed descriptive analysis of two time series, that would be rather interesting ... In R you could play with the function coplot and you could make scatterplot matrices, replacing what would be one number in each of the two functions above (autocorrelation, crosscorrelation) with a scatterplot. You could also look into copulas used with time series.

Best Answer

Related Solutions

R – Identifying Issues with Spearman Correlation in Presence of Many Ties

Solved – Comparing time series: Pearson correlation, Kendall’s tau b or Spearman’s rho

Related Question