Solved – bootstrapping spearman vs. pearson multiple correlations

bootstrappearson-rspearman-rho

I would like to calculate simple correlations. Since my sample size is quite small and data are not normally distributed I thought about Spearman’s correlations. Then I read about bootstrapping as a distribution free method and, considering my small sample size, I decided to use it with Pearson’s correlation coefficient. However, recently I’ve found bootstrapping for Spearman correlations and got confused. If bootstrapping solves the problem with the data distribution and small sample size (i.e. one can use Pearson’s method) why I should bother with bootstrapped Spearman. Where does the problem lie? What method should I use? Are there any rules?

just adding more details:

The whole problem is as follows:
I’d like to test 14 multiple correlations (1 vs. 14). A sample size is fairly small (approximately n=50) and the data are not normally distributed. So, I have two problems one with correlations and the second with the Type I error.
My natural choice was to look at Spearman’s correlations and in order to check if they are robust (effect size and significance) I wanted to bootstrap them. I’d followed this comment: http://www.methodspace.com/profiles/blogs/bonferroni-correcting-lots-of-correlations.
I don’t know if assumed relationship is linear – from the scatterplot it looks like linear and without evident outliers.
Then, using bootstrapped p-values I was thinking about Bonferroni-Holm’s correction for repeated measurements or/and calculating q-value (to control for False Discovery Rate)

So the basic questions are:

1) If this approach is correct?
2) How can I justify (e.g. for a reviewer) that I’d chosen Spearman’s or Pearson’s correlations (results are different – I’ve just checked)

thanks!

Best Answer

Do you mean 1 target variable (Y) correlated against 14 X variables? In case you want to test if two variables, say the target $Y$ and the predicted $\hat{Y} $ are correlated or not in a nonparametric way, you need to use bootstrap. What you'll do is compute the correlation (spearman/ pearson) of the $Y,\hat{Y}$ values several times (100/500) across different resamples and then create a distribution of the computed values. If $0$ is included in the $[q_{\alpha/2},q_{100-\alpha/2}]$ interval where $q_p$ denotes the $p$th percentile of the computed distribution, then you can say that the correlation coeff is not significantly different from $0$ at level $\alpha$, usually alpha is taken as 5%. Also Pearson correlation $0$ indicates absence of linear dependence, while Spearman correlation $0$ indicates absence of monotonic dependence.

Related Question