Hypothesis Testing – How to Compare Two Non-Normally Distributed Samples of Different Sizes Using Mann-Whitney or Bootstrap

bootstraphypothesis testingnonparametricp-valuewilcoxon-mann-whitney-test

Perhaps this is a very basic question, but I didn't find yet a simple solution for this simple problem:

I want to compare two samples (say X and Y) for a continuous variable which is non-normally distributed and test if X and Y are significantly different. The sample size of X is N=81 and Y is N=5110, so they are quite unbalanced. My first attempt was to use the Mann-Whitney (i.e. Wilcoxon Signed Rank test). However, I am bothered with this huge difference in sample sizes.

I thought that some kind of randomization or bootstrap method is a good alternative, but I am not sure if my approach makes sense. My idea was to get 1000 random samples of size 81 from Y and X and then use the Mann-Whitney to compare both distributions. The empirical p-value would be the proportion of tests with p-value < 0.05. I "R", I've implemented as follows:

X = data1 # sample size 81
Y = data2 # sample size 5510
R = 1000
alpha = numeric(R)

for(i in 1:R) {
    group1 = sample(X, replace=TRUE)
    group2 = sample(Y, size=81, replace=TRUE)
    alpha[i] = wilcox.test(group1, group2)$p.value
}

Empirical p-value would be the proportion of p-values < 0.05:

mean(alpha < 0.05)

Does this approach make sense? How can I do this hypothesis testing correctly?

Best Answer

I am not a big expert on statistical testing, but the approach you are considering decidedly does not make sense. Imagine that the groups are indeed identical (i.e. null hypothesis is true). Then you will observe p<0.05 in exactly 5% of the cases, and e.g. p<0.01 in 1% of the cases (those would be false positives). So following your logic, you would reject the null.

I am not aware of any problems with Wilcoxon-Mann-Whitney test in case of different numbers of observations. So one option you have is to run the ranksum test as usual, without any further complications.

However, if you do feel concerned about the very different $N$, you can try a simple permutation test: pool both groups together (obtaining $81+5110=5191$ numbers) and randomly select $81$ values as group A and all the rest as group B. Then take the difference between the means (or medians) of A and B (let's call it $\mu$), and repeat this many many times. This will give you a distribution $p(\mu)$. At the same time for your actual groups X and Y you have some fixed empirical value of $\mu^*$. Now you can check if $\mu^*$ lies in the 95% percentile interval of $p(\mu)$. If it does not, you can reject the null with p<0.05.

Related Question