Wilcoxon Rank-Sum Test – When to Use Instead of the Unpaired T-Test

This is a followup question to what Frank Harrell wrote here:

In my experience the required sample size for the t distribution to be
accurate is often larger than the sample size at hand. The Wilcoxon
signed-rank test is extremely efficient as you said, and it is robust,
so I almost always prefer it over the t test

If I understand it correctly – when comparing the location of two unmatched samples, we would prefer to use the Wilcoxon rank-sum test over the unpaired t-test, if our sample sizes are small.

Is there a theoretical situation where we would prefer the Wilcoxon rank-sum test over the unpaired t-test, even that the sample sizes of our two groups are relatively large?

My motivation for this question stems from the observation that for a single sample t-test, using it for a not-so-small sample of a skewed distribution will yield a wrong type I error:

n1 <- 100
mean1 <- 50
R <- 100000
P_y1 <- numeric(R)
for(i in seq_len(R))
{
    y1 <- rexp(n1, 1/mean1)
    P_y1[i] <- t.test(y1 , mu = mean1)$p.value
}
sum(P_y1<.05) / R # for n1=n2=100 -> 0.0572  # "wrong" type I error

Best Answer

Yes, there is. For example, any sampling from distributions with infinite variance will wreck the t-test, but not the Wilcoxon. Referring to Nonparametric Statistical Methods (Hollander and Wolfe), I see that the asymptotic relative efficiency (ARE) of the Wilcoxon relative to the t test is 1.0 for the Uniform distribution, 1.097 (i.e., Wilcoxon is better) for the Logistic, 1.5 for the double Exponential (Laplace), and 3.0 for the Exponential.

Hodges and Lehmann showed that the minimum ARE of the Wilcoxon relative to any other test is 0.864, so you can never lose more than about 14% efficiency using it relative to anything else. (Of course, this is an asymptotic result.) Consequently, Frank Harrell's use of the Wilcoxon as a default should probably be adopted by almost everyone, including myself.

Edit: Responding to the followup question in comments, for those who prefer confidence intervals, the Hodges-Lehmann estimator is the estimator that "corresponds" to the Wilcoxon test, and confidence intervals can be constructed around that.

Best Answer

Related Solutions

Solved – Bootstrap p.value, Wilcoxon rank sum or other test

Solved – Paired or unpaired Wilcoxon test

Related Question