I want to show by simulation that the Wilcoxon test is more robust than the Student test for non-normally distributed data

nonparametricparametricrsimulation

I want to test by simulation that the Wilcoxon test is more robust than the Student test for non-normally distributed data.

For example, I'm testing the uniform distribution and the exponential distribution. I don't know if I simulated it wrong or if I missed something. But I can't find the robustness of the Wilcoxon test compared with the Student test?

I have simulated so that the average of population A is higher than that of population B with one point.

##UNIFORM------
populationA<-round(runif(100000, 73,75),1)
populationB<-round(runif(100000, 72,74),1)
t.test(populationA,populationB)

plot(density(populationA))
plot(density(populationB))

n<-5
n<-10

sub_popA<-sample(populationA,size = n)
sub_popB<-sample(populationB,size = n)

t.test(sub_popA,sub_popB)$p.value
wilcox.test(sub_popA,sub_popB)$p.value

With different sample sizes, I've found that the Student test is closer to the truth than the Wilcoxon test.

It is the same for the exponential distribution, I didn't find a superiority of the Wilcoxon test compared to Student test. Even after reproducing the sampling 100 time and counting the number

##
populationA<-rexp(100000,1)
populationB<-rexp(100000,1/2)
t.test(populationA,populationB)

sub_popA<-sample(populationA,size = 10)
sub_popB<-sample(populationB,size = 10)
t.test(sub_popA,sub_popB)
wilcox.test(sub_popA,sub_popB)

Best Answer

With respect to your choices of distribution, the uniform distribution is not something you need to worry about being robust against. In the old days, a common way of simulating a Normal distribution was to average 12 uniform variates, which implies that a t-statistic based on two samples of size six would be close to the desired distribution under the null hypothesis. In fact, the asymptotic relative efficiency of the Wilcoxon to the t-test for the uniform distribution is... $1.0$. (For the Normal, it's $0.955$.)

Generally speaking (but not always), you want to protect against outliers relative to the "base" distribution (in this case, the Normal), which can be thought of as generated by a distribution with "fatter" tails than the Normal.

Let's do a more extensive simulation with 10,000 repeats of the procedure (100 is far too small a sample size to draw any conclusions except in the most egregious cases) with our base distribution being the fat-tailed $t(3)$ distribution:

library(data.table)
reject <- data.table(t = rep(0,10000), w = rep(0, 10000))

for (i in 1:nrow(reject)) {
    x1 <- rt(10, 3)
    x2 <- rt(10, 3) - 2
    
    reject$t[i] <- t.test(x1,x2)$p.value
    reject$w[i] <- wilcox.test(x1,x2)$p.value
}

reject[, .(t_reject = mean(t < 0.01), Wilcox_reject = mean(w < 0.01))]

which gives us the following:

> reject[, .(t_reject = mean(t < 0.01), Wilcox_reject = mean(w < 0.01))]
   t_reject Wilcox_reject
1:    0.551         0.625
> reject[, .(t_reject = mean(t < 0.05), Wilcox_reject = mean(w < 0.05))]
   t_reject Wilcox_reject
1:    0.766        0.8414

Clearly favoring the Wilcoxon test.

Now for your Exponential distribution test. Your two Exponential distributions differ in scale, not location; this makes it harder to detect changes in the mean. However, with a larger number of repeats of the experiment, we can still see a difference:

for (i in 1:nrow(reject)) {
    x1 <- rexp(10)
    x2 <- rexp(10)/3
    
    reject$t[i] <- t.test(x1,x2)$p.value
    reject$w[i] <- wilcox.test(x1,x2)$p.value
}

> reject[, .(t_reject = mean(t < 0.01), Wilcox_reject = mean(w < 0.01))]
   t_reject Wilcox_reject
1:   0.1176        0.2313
> reject[, .(t_reject = mean(t < 0.05), Wilcox_reject = mean(w < 0.05))]
   t_reject Wilcox_reject
1:   0.4536        0.4697

If, instead of rescaling the Exponential distributions, we add a location parameter and rerun the tests, testing for differences in location, we get the following:

for (i in 1:nrow(reject)) {
    x1 <- rexp(10)
    x2 <- rexp(10) + 0.5
    
    reject$t[i] <- t.test(x1,x2)$p.value
    reject$w[i] <- wilcox.test(x1,x2)$p.value
}

> reject[, .(t_reject = mean(t < 0.01), Wilcox_reject = mean(w < 0.01))]
   t_reject Wilcox_reject
1:   0.0708        0.1283
> reject[, .(t_reject = mean(t < 0.05), Wilcox_reject = mean(w < 0.05))]
   t_reject Wilcox_reject
1:    0.222         0.313

Note also that the t-test will fail when the underlying distributions do not have a finite variance, but the Wilcoxon will not.